Title: Statistical Learning Methods involving
Dimensionality Reduction
Description:
In the modern world, it has become standard
practice to produce large databases of high-dimensional multivariate
observations (often automatically). Examples include measurements on tens of
thousands of genes in micro-array experiments, continuous and multiple
measurements on human body and brain function (e.g fMRI, EEG), and automatic
market-basket accumulation at supermarket checkouts. The challenges are then to
be able to extract and interpret these large volumes of data in a coherent and
timely fashion. Many approaches can be seen to reduce the dimensionality of the
data (of objects, variables, situations, etc.) in a way that exposes the
structure, while
maintaining the integrity of the data as a whole.
This reduction is often achieved by a supervised or unsupervised classification
algorithm, or a learning algorithm that yields classes of objects. Clustering
methods have also been used to reduce the number of variables after defining
specific proximity measures between variables. Even more frequently, such a
reduction is obtained by (parametric or nonparametric) factorial techniques,
e.g., a principal component analysis, as well as by regression methods (including,
e.g., PLS).
This session focuses on situations where the main statistical problem of
interest (e.g., regression, discrimination, or clustering) can be fruitfully
combined with dimension reduction tools. An often-used sequential approach is to
first apply a data reduction algorithm to the original data matrix, and then use
the resulting reduced classes or variables to resolve the main problem. However,
a more effective approach might be to combine the main problem and the reduction
problem in a single model, and to solve both problems simultaneously. Several
techniques have been recently proposed along these lines.
Hence a general objective of the planned session is to collect new methodology
and statistical learning methods that exploit the simultaneous reduction of
objects, variables and/or other dimensions of the observed data when combining
them with statistical problems such as regression and classification. These
methodologies seem to have many potential applications in the analysis of large
data sets, for example microarray data, marketing data, or image processing.
Focus:
o Supervised classification and dimensionality
reduction with:
- Classical discrimination methods (linear and non linear discriminant analysis,
logistic regression)
- Classification (decision) trees
- Support Vector Machines and Neural Network Solution
- Partial least Squares (PLS)
o Clustering methods with dimensionality reduction, e.g. in the context of
- Maximum likelihood clustering
- Mixture models
- Least square clustering
- Classical clustering methods
o Multiway clustering methods
- Two-way and three-way clustering of data matrices
- Models and algorithms for block clustering
o Regression and dimensionality reduction with:
- Kernel methods
- Linear methods (shrinkage methods, derived input dimensions, subset selection)
Co-Chairs:
Hans-Hermann Bock
RWTH Aachen University
D-52056Aachen,
Germany
Tel: ++49 241 809-4573
Fax: ++49 241 809-2130
E-mail: bock@stochastik.rwth-aachen.de
|
Trevor Hastie
Department of Statistics
Sequoia Hall,
Stanford University,
Stanford, CA 94305
USA
Tel +1 650 723-2620
Fax: +1 650 725 - 8977
E-mail: hastie@stanford.edu |
Maurizio Vichi
Dpt. Probability Statistics and Applied Statistics
University "La Sapienza" of Rome
I-00185 Rome
Italy
Tel: +39 06 49910405
Fax: +39 06 4959241
E-mail: maurizio.vichi@uniroma1.it
|
|
|