Penalized Generalized Estimating Equations (PGEE) were introduced by Fu (2003) as a regression technique for longitudinal data to select the most important variables and to estimate their effects. This is particularly useful when many covariates are present and one is interested in the influence of the most important ones on the mean response. The covariance within subjects is treated as a nuisance parameter.

When correlations are present within the covariate set, it is useful to adapt the penalty function to avoid computational difficulties and to improve its prediction and selection capabilities. Combining penalty functions is first described by Zou & Hasty (2005), using an Elastic Net (EN) to capture ridge and LASSO penalization.

We elaborate on the asymptotic theory of Smoothly Clipped Adaptive Deviation penalty (SCAD) within PGEE. We used Dziak’s asymptotic theory to study penalty functions  for multicollinearity and present our EN and SCAD_L2 in Blommaert et al (2014).


We have proposed penalized generalized estimating equations with Elastic Net or L2-Smoothly Clipped Absolute Deviation penalization are proposed to simultaneously select the most important variables and estimate their effects for longitudinal Gaussian data when multicollinearity is present. The method is able to consistently select and estimate the main effects even when strong correlations are present. In addition, the potential pitfall of time-dependent covariates is clarified. Both asymptotic theory and simulation results reveal the effectiveness of penalization as a data mining tool for longitudinal data, especially when a large number of variables is present. The method is illustrated by mining for the main determinants of life expectancy in Europe.

A  correction of an error in a formula of this article  can be found here.

We implemented the EN and SCAD_L2 algorithm, based on the work of Fan & Li (2001) and applied our methodology on a simple example. The code can be found here.

More research was done by Wang et al (2011) on the asymptotic theory of PGEE estimators with EN and SCAD_L2 with exponential family distributions (we only implemented the Gaussian and binomial case).