Session S3 - Estimation à noyau: nouvelles approches pour la sélection de fenêtre

Organisation : Angelina Roche

Sélection d'estimateurs par comparaison pénalisée à l'estimateur de surapprentissage (pdf)

Orateur : Claire LACOUR, Université Paris-Sud

Une des problématiques importantes de l'estimation non-paramétrique est la sélection d'estimateurs. Ici on considère le cas classique de l'estimation d'une densité par des estimateurs à noyau. La difficulté réside alors dans le choix de la fenêtre (la fenêtre optimale dépendant a priori de la densité inconnue). On présente une nouvelle méthode de sélection de fenêtre qui est une sorte d'intermédiaire entre la méthode de Lepski et celle de minimisation du risque empirique pénalisé. Pour cette nouvelle méthode appelée "Penalized Comparison to Overfitting", on montre une inégalité oracle (pour le risque $L^2$), ainsi qu'un résultat de type "pénalité minimale". Ainsi on retrouve toutes les bonnes propriétés théoriques de la méthode de Goldenshluger-Lepski, mais avec un coût computationnel bien inférieur, en particulier dans le cas multivarié. On présentera également des simulations numériques, qui comparent notamment cette méthode à la validation croisée, en dimension 1 à 4.

CDRodeo : Sélection gloutonne de fenêtres multivariées en estimation de densité conditionnelle (pdf)

Orateur : Minh-Lien Jeanne NGUYEN, Université Paris-Sud

Dans cet exposé, on s'intéresse à l'estimation de densités conditionnelles en modérément grandes dimensions. La densité conditionnelle, moins restrictive que la fonction de régression, permet d'en dériver de nombreuses autres quantités d'intérêt (quantiles, intervalles de confiance, modes...). À partir d'une famille d'estimateurs à noyau adaptée pour l'estimation de densité conditionnelle, on revisite l'algorithme Rodeo [LW,LLW] pour sélectionner une fenêtre locale multivariée. La méthode répond à plusieurs enjeux : éviter le fléau de la dimension en combinant une exécution rapide et la détection de variables non pertinentes (en cas de fonction parcimonieuse) et converger à vitesse optimale au sens minimax (à un facteur logarithmique près).

[N18] : Nguyen, M.-L.J. (2018, pré-publication) Nonparametric method for sparse conditional density estimation in moderately large dimensions. hal-01688664.
[LW] : Lafferty J.D., Wasserman L.A. (2008) Rodeo: Sparse, greedy nonparametric regression. Annals of Statistics, Vol. 36, No. 1, 28-63.
[LLW] : Liu H., Lafferty J.D., Wasserman L.A. (2007) Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo. AISTATS, 283-290.

Local polynomial estimation of the intensity of a doubly stochastic Poisson process with bandwidth selection procedure. (pdf)

Orateur : Thomas Deschatre, FiME Lab, Place du Maréchal de Lattre de Tassigny, 75016 Paris

We consider a doubly stochastic Poisson process with stochastic intensity $λ_t =n q(X_t)$ where $X$ is a continuous Itô semimartingale and $n$ is an integer. Both processes are observed continuously over a fixed period $[0,T]$. An estimation procedure is proposed in a non parametrical setting for the function $q$ on an interval $I$ where $X$ is sufficiently observed using a local polynomial estimator. A method to select the bandwidth in a non asymptotic framework is proposed, leading to an oracle inequality. If $m$ is the degree of the chosen polynomial, the accuracy of our estimator over the Hölder class of order $β$ is $n^-β/2β+1$ if $m ≥β$ and $n^-m/2m+1$ if $m < β$ and is optimal in the minimax sense if $m ≥β$. A parametrical test is also proposed to test if $q$ belongs to some parametrical family. Those results are applied to French temperature and electricity spot prices data where we infer the intensity of electricity spot spikes as a function of the temperature.

Adaptive kernel estimation of the baseline function in the Cox model with high-dimensional covariatesSarah LEMLER (pdf)

Orateur : Sarah LEMLER, École CentraleSupélec

Recurrent event data arise in such fields as medicine, insurance, economics, and reliability. Such events include for example relapse from a disease in biomedical research, monetization in marketing or blogging in social network study. In this context, proportional hazards models have been largely studied in the literature to model the rate functions of recurrent event data, that represents the instantaneous probability of experiencing a recurrent event at a given time. In this work, we propose a novel kernel estimator of the baseline function in a general high-dimensional Cox model, for which we derive non-asymptotic rates of convergence. To construct our estimator, we first estimate the regression parameter in the Cox model via a Lasso procedure. We then plug this estimator into the classical kernel estimator of the baseline function, obtained by smoothing the so-called Breslow estimator of the cumulative baseline function. We propose and study an adaptive procedure for selecting the bandwidth, in the spirit of Goldenshluger and Lepski (2011). We state non-asymptotic oracle inequalities for the final estimator, which reveal the reduction of the rates of convergence when the dimension of the covariates grows. Lastly, we conduct a study to measure the practical performances of the resulting estimator on simulated data and we apply the implemented procedure to a real dataset.

[1] : O. Bouaziz, F. Comte, A. Guilloux, Nonparametric estimation of the intensity function of a recurrent event process, Statist. Sinica 23 (2) (2013) 635-665.
[2] : Guilloux, A., Lemler, S., Taupin, M. L. (2016). Adaptive kernel estimation of the baseline function in the Cox model with high-dimensional covariates. Journal of Multivariate Analysis, 148, 141-159.
[3] : A. Goldenshluger, O. Lepski, Bandwidth selection in kernel density estimation: oracle inequalities and adaptive minimax optimality, Ann. Statist. 39 (3) (2011) 1608-1632.
[4] : H. Ramlau-Hansen, The choice of a kernel function in the graduation of counting process intensities, Scand. Actuar. J. 3 (1983) 165-182.