STAT151A Homework 2: Due February 9th
1 Transformation of variables
Consider two different regressions, \(\boldsymbol{Y}\sim \boldsymbol{X}\boldsymbol{\beta}\) and \(\boldsymbol{Y}\sim \boldsymbol{Z}\boldsymbol{\alpha}\), with the same \(\boldsymbol{Y}\), where \(\boldsymbol{X}\) and \(\boldsymbol{Z}\) are both \(N \times P\) and are both full-rank. Let the \(n\)–th row of \(\boldsymbol{X}\) be written \(\boldsymbol{x}_n^\intercal\), and the \(n\)–th row of \(\boldsymbol{Z}\) be \(\boldsymbol{z}_n^\intercal\).
(a)
Suppose \(\boldsymbol{x}_n = \boldsymbol{A}\boldsymbol{z}_n\) for an invertible \(\boldsymbol{A}\) and for all \(n = 1,\ldots,N\). Find an expression for \(\hat{\alpha}\) in terms of \(\hat{\beta}\) that does not explicitly use \(\boldsymbol{Y}\), \(\boldsymbol{X}\), or \(\boldsymbol{Z}\).
(b)
Suppose that, for all \(n=1,\ldots,N\), \(\boldsymbol{x}_n = f(\boldsymbol{z}_n)\) for some invertible but non-linear function \(f(\cdot)\). In general, can you find an expression for \(\hat{\alpha}\) in terms of \(\hat{\beta}\) that does not explicitly use \(\boldsymbol{Y}\), \(\boldsymbol{X}\), or \(\boldsymbol{Z}\)? Prove why or why not. (To prove that you cannot, finding a single counterexample is enough.)
(c)
Now consider only the regression \(\boldsymbol{Y}\sim \boldsymbol{X}\boldsymbol{\beta}\), but suppose we are not interested in \(\boldsymbol{\beta}\), but rather some other \(\boldsymbol{\gamma}= \phi(\boldsymbol{\beta})\), where \(\phi\) is an invertible function. Prove that the least squares estimator of \(\boldsymbol{\gamma}\) is given by \(\hat{\boldsymbol{\gamma}}= \phi(\hat{\boldsymbol{\beta}})\).
(d)
Prove that result (a) is special case of the result (c). (Hint: find the corresponding \(\phi\).)
2 Spaces of possible estimators.
Consider the simple linear model \(y_n = \beta_0 + \beta_1 z_n + \varepsilon_n\). Assume that \(\frac{1}{N} \sum_{n=1}^Nz_n \ne 0\).
(a)
Fix \(\beta_0 = \frac{1}{N} \sum_{n=1}^Ny_n\) and find a value of \(\beta_1\) such that \(\frac{1}{N} \sum_{n=1}^N\varepsilon_n = 0\). How does your answer depend on whether or not \(\frac{1}{N} \sum_{n=1}^Nz_n = 0\)?
(b)
Fix \(\beta_1 = 10,000,000\) and find a value of \(\beta_0\) such that \(\frac{1}{N} \sum_{n=1}^N\varepsilon_n = 0\).
(c)
In general, how many different choices of \(\beta_0\) and \(\beta_1\) can you find that satisfy \(\frac{1}{N} \sum_{n=1}^N\varepsilon_n = 0\)? Are all of them reasonable? Are any of them reasonable?
(d)
Find an \(N\)–dimensional vector \(\boldsymbol{v}\) such that \[ \frac{1}{N} \sum_{n=1}^N\varepsilon_n = 0 \quad\Leftrightarrow\quad \boldsymbol{v}^\intercal\boldsymbol{\varepsilon}= \boldsymbol{0}. \]
(e)
Suppose I give you a general \(N\)–dimensional vector \(\boldsymbol{v}\) and a scalar \(a\). How many different choices of \(\beta_0\) and \(\beta_1\) can you find such that \(\boldsymbol{v}^\intercal\boldsymbol{\varepsilon}= a\)?
(f) (Optional — this will not be graded)
Suppose I give you two different vectors, \(\boldsymbol{v}_1\) and \(\boldsymbol{v}_2\). Under what circumstances can you find \(\beta_0\) and \(\beta_1\) such that
\[ \begin{aligned} \boldsymbol{v}_1^\intercal\boldsymbol{\varepsilon}= \boldsymbol{0} \quad\textrm{and}\quad \boldsymbol{v}_2^\intercal\boldsymbol{\varepsilon}= \boldsymbol{0}? \end{aligned} \]
When are there infinitely many solutions? When is there only one solution? (Hint: what if \(\boldsymbol{v}_1^\intercal\boldsymbol{1}= \boldsymbol{v}_2^\intercal\boldsymbol{1}= 0\)?)
(g)
Now, consider the general linear model \(\boldsymbol{Y}= \boldsymbol{X}\boldsymbol{\beta}+ \varepsilon\). Prove that there always exists \(\boldsymbol{\beta}\) and \(\varepsilon\) so that the \(\boldsymbol{Y}= \boldsymbol{X}\boldsymbol{\beta}+ \varepsilon\).
(h) (Optional — this will not be graded)
Suppose, for the general linear model, that the matrix \(\boldsymbol{X}\) is full-rank (that is, of rank \(P\), where \(P\) is the number of columns of \(\boldsymbol{X}\)). Suppose I give you a \(N \times D\) matrix \(\boldsymbol{V}\), and ask you to find \(\boldsymbol{\beta}\) such that \(\boldsymbol{V}^\intercal\boldsymbol{\varepsilon}= \boldsymbol{0}\). Under what circumstances are there no solutions? A single solution? An infinite set of solutions? (Hint: you already answered this question for \(P = 2\), now you just need to state the result in matrix form.)
3 Collinear regressors
Suppose that \(\boldsymbol{X}\) does not have full column rank — that is, \(\boldsymbol{X}\) is \(N \times P\) but has column rank \(Q < P\).
(a)
How many solutions \(\hat{\beta}\) are there to the least-squares problem \[ \hat{\beta}:= \underset{\beta}{\mathrm{argmin}}\, \left\Vert\boldsymbol{Y}- \boldsymbol{X}\beta\right\Vert_2^2? \]
(b)
Relate the solutions \(\hat{\beta}\) from part (a) to spaces spanned by eigenvectors of \(\boldsymbol{X}^\intercal\boldsymbol{X}\). Among the solutions, identify the one with the smallest norm, \(\left\Vert\hat{\beta}\right\Vert_2^2\).
(c)
Suppose that \(\boldsymbol{X}'\) is a full column-rank \(N \times Q\) matrix with the same column span as \(\boldsymbol{X}\), and let \(\hat{\gamma}\) be the OLS estimator for the regression \(\boldsymbol{Y}\sim \boldsymbol{X}'\gamma\). Compare the fits \(\hat{\boldsymbol{Y}}= \boldsymbol{X}\hat{\beta}\) and \(\hat{\boldsymbol{Y}}' = \boldsymbol{X}' \hat{\gamma}\), and compare the sum of squared residuals for the two regressions.