Regularisation and Sparsity

Supervisors*: Dr L.J.M. Aslett and Prof F.P.A. Coolen

The linear model is ubiquitous in statistics, providing an elegant way to express the dependence of some response \(y_i\) on a set of predictors \(\mathbf{x}_i \in \mathbb{R}^d\), where \(i\in\{1,\dots,n\}\) denotes an observation number. In the setting where \( y_i\in\mathbb{R} \), the linear model is simply, $$ y_i = \beta_0 + \sum_{j=1}^d \beta_j x_{ij} + \varepsilon_i, $$ where \(\beta_0\) is the intercept, \(\beta_j\) is the coefficient for the \(j\)-th predictor, and \(\varepsilon_i\) is a random error term which is typically taken to be Normally distributed with zero mean. We can still use linear modelling methods when the response is not continuous (eg. \(y_i\) may be integer, or may be an unordered category), leading to the generalised linear model framework. Whatever the formulation, we are often interested in the values of \(\beta_1,\dots,\beta_d\), since these help us understand the relationship between \(y\) and each of the predictor variables. However, there can be problems!

For example, even when there is no real relationship between \(y\) and one or more of the predictors, the linear model will almost always still lead to \(\beta_j \ne 0 \ \forall\,j\) (the magnitudes may be small, but still non-zero). If you studied the linear model in second year (Statistical Modelling II), you will have seen this treated using a hypothesis test, \(H_0: \beta_j = 0\) versus \(H_1: \beta_j \ne 0\), for each \(j\). This situation can arise quite easily in practice, because data is so widely collected that we often have many variables collected for each observation, not all of which will be relevant to the problem at hand (though we may not know which are irrelevant apriori).

But even worse still, if \(d > n\) — that is, there are actually more variables than observations — then there may be no unique solution to the linear model! This is because the corresponding design matrix, \(X\), results in \(X^T X\) being a singular matrix, and so it cannot be inverted. Data of this kind arise all the time in particularly high dimensional applications like genomics, where we may have many thousands of variables (gene variants) for each observation (person).

This project will examine regularisation methods which enable us to address both the settings above, without having to resort to hypothesis testing! One approach to regularisation imposes a penalty on the size of the coefficients, and so encourages them to be small. Indeed, certain choices of penalty can be proven to lead to some coefficients being exactly zero when they have no real relationship to \(y\).

An excellent book which gives a good introduction to this topic is Hastie et al (2015). This project can be taken in many directions from this basic starting point, depending on the interests of the student. This could include studing the extended generalised linear model framework and regularisation methods for it; comparisons of different regularisation methods including ridge (Hoerl & Kennard, 1970), lasso (Tibshirani, 1996), and elastic net (Zou & Hastie, 2005) approaches; the use of regularisation methods in particular high dimensional applications; Bayesian approaches to regularisation; and many more.

*Dr Aslett will supervise Michaelmas and Prof Coolen will supervise Epiphany.

Prerequisites

Statistical Inference II.

It would be a slight advantage to have Statistical Modelling II, but it is not a strict pre-requisite.

There will be some basic R coding required to explore the methods and try out some applications.

References

Hastie, T., Tibshirani, R. and Wainwright, M., 2015. Statistical learning with sparsity: the lasso and generalizations. CRC press. Freely (and legally!) available here, or via Durham library.

Hoerl, A.E. and Kennard, R.W., 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), pp.55-67. DOI: 10.2307/1267351

Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), pp.267-288. DOI: 10.1111/j.2517-6161.1996.tb02080.x

Zou, H. and Hastie, T., 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2), pp.301-320. DOI: 10.1111/j.1467-9868.2005.00503.x