3rd IASC world conference on
Computational Statistics & Data Analysis
Amathus Beach Hotel, Limassol, Cyprus, 28-31 October, 2005
 
Title: Statistical Learning Methods involving Dimensionality Reduction

Description:

In the modern world, it has become standard practice to produce large databases of high-dimensional multivariate observations (often automatically). Examples include measurements on tens of thousands of genes in micro-array experiments, continuous and multiple measurements on human body and brain function (e.g fMRI, EEG), and automatic market-basket accumulation at supermarket checkouts. The challenges are then to be able to extract and interpret these large volumes of data in a coherent and timely fashion. Many approaches can be seen to reduce the dimensionality of the data (of objects, variables, situations, etc.) in a way that exposes the structure, while
maintaining the integrity of the data as a whole.

This reduction is often achieved by a supervised or unsupervised classification algorithm, or a learning algorithm that yields classes of objects. Clustering methods have also been used to reduce the number of variables after defining specific proximity measures between variables. Even more frequently, such a reduction is obtained by (parametric or nonparametric) factorial techniques, e.g., a principal component analysis, as well as by regression methods (including, e.g., PLS).

This session focuses on situations where the main statistical problem of interest (e.g., regression, discrimination, or clustering) can be fruitfully combined with dimension reduction tools. An often-used sequential approach is to first apply a data reduction algorithm to the original data matrix, and then use the resulting reduced classes or variables to resolve the main problem. However, a more effective approach might be to combine the main problem and the reduction problem in a single model, and to solve both problems simultaneously. Several techniques have been recently proposed along these lines.

Hence a general objective of the planned session is to collect new methodology and statistical learning methods that exploit the simultaneous reduction of objects, variables and/or other dimensions of the observed data when combining them with statistical problems such as regression and classification. These methodologies seem to have many potential applications in the analysis of large data sets, for example microarray data, marketing data, or image processing.

Focus:

o Supervised classification and dimensionality reduction with:
- Classical discrimination methods (linear and non linear discriminant analysis, logistic regression)
- Classification (decision) trees
- Support Vector Machines and Neural Network Solution
- Partial least Squares (PLS)

o Clustering methods with dimensionality reduction, e.g. in the context of
- Maximum likelihood clustering
- Mixture models
- Least square clustering
- Classical clustering methods

o Multiway clustering methods
- Two-way and three-way clustering of data matrices
- Models and algorithms for block clustering

o Regression and dimensionality reduction with:
- Kernel methods
- Linear methods (shrinkage methods, derived input dimensions, subset selection)

Co-Chairs:

Hans-Hermann Bock
RWTH Aachen University
D-52056Aachen,
Germany
Tel: ++49 241 809-4573
Fax: ++49 241 809-2130
E-mail: bock@stochastik.rwth-aachen.de

Trevor Hastie
Department of Statistics
Sequoia Hall,
Stanford University,
Stanford, CA 94305
USA
Tel +1 650 723-2620
Fax: +1 650 725 - 8977
E-mail: hastie@stanford.edu
Maurizio Vichi
Dpt. Probability Statistics and Applied Statistics
University "La Sapienza" of Rome
I-00185 Rome
Italy
Tel: +39 06 49910405
Fax: +39 06 4959241
E-mail: maurizio.vichi@uniroma1.it