Function estimation tasks in high dimensions
occur in various areas of statistical application. Such applications are not
restricted to science and engineering. They are also relevant for modelling in
finance and economics. Most recently high dimensional estimation problems have
gained a lot of attention in the context of genetic research (e.g. the
statistical analysis of microarray data). An additional complication in these
new bioscience applications is the fact that we are confronted with a very large
number of variables (e.g. genes) and at the same time a moderate number of
observational units (e.g. patients). This situation leads to ill-posed problems
resulting in overfitting of models and poor predictive ability. Apart from
methodological questions there are numerous numerical and computational
challenges which should be in the focus of the CSDA journal (e.g. special
issue). As a matter of fact we are not only confronted with potential
non-linearity of the functions involved and the curse of dimensionality due to
data sparseness in feature space, but also with the requirement of complexity
reduction (regularization).
Our intention is to bridge the gap between the most successful research
concerning non- and semiparametric techniques which has evolved over the last 30
years and relatively new statistical learning approaches such as support vector
machines. They have the ability to control complexity in high dimensional
problems at the cost of linearizing functional relationships. The wealth of
experience concerning bandwidth choice in nonparametric regression and density
estimation could cross-fertilize attempts to select complexity parameters in
statistical learning. The ultimate goal of future research would be non- or
semiparametric approaches that allow to control complexity without implicit
restrictions on the functional forms. Of course this would mean controlling both
function smoothness and model complexity.