STAT151A Homework 2 (with solutions)
This homework is due on Gradescope on Monday September 30th at 9pm.
1 Ordinary least squares in matrix form
Consider simple least squares regression \(y_n = \beta_1 + \beta_2 x_n + \varepsilon_n\), where \(x_n\) is a scalar. Assume that we have \(N\) datapoints. We showed directly that the least–squares solution is given by
\[ \begin{aligned} \hat{\beta}_1 ={}& \overline{y} - \hat{\beta}_2 \overline{x} &and\quad\quad \hat{\beta}_2 ={}& \frac{\overline{xy} - \overline{x} \, \overline{y}} {\overline{xx} - \overline{x}^2}. \end{aligned} \]
Let us re–derive this using matrix notation.
(a)
Write simple linear regression in the form \(\boldsymbol{Y}= \boldsymbol{X}\boldsymbol{\beta}+ \boldsymbol{\varepsilon}\). Be precise about what goes into each entry of \(\boldsymbol{Y}\), \(\boldsymbol{X}\), \(\boldsymbol{\beta}\), and \(\boldsymbol{\varepsilon}\). What are the dimensions of each?
(b)
We proved that the optimal \(\hat{\boldsymbol{\beta}}\) satisfies \(\boldsymbol{X}^\intercal\boldsymbol{X}\hat{\boldsymbol{\beta}}= \boldsymbol{X}^\intercal\boldsymbol{Y}\). Define the “barred quantities” \[ \begin{aligned} \overline{y} ={}& \frac{1}{N} \sum_{n=1}^Ny_n \\ \overline{x} ={}& \frac{1}{N} \sum_{n=1}^Nx_n \\ \overline{xy} ={}& \frac{1}{N} \sum_{n=1}^Nx_n y_n \\ \overline{xx} ={}& \frac{1}{N} \sum_{n=1}^Nx_n ^2, \end{aligned} \]
In terms of the barred quantities and the number of datpoints \(N\), write expressions for \(\boldsymbol{X}^\intercal\boldsymbol{X}\) and \(\boldsymbol{X}^\intercal\boldsymbol{Y}\).
(c)
When is \(\boldsymbol{X}^\intercal\boldsymbol{X}\) invertible? Write a formal expression in terms of the barred quantities. Interpret this condition intuitively in terms of the distribution of the regressors \(x_n\).
(d)
Using the formula for the inverse of a \(2\times 2\) matrix, find an expression for \(\hat{\boldsymbol{\beta}}\), and confirm that we get the same answer that we got by solving directly.
(e)
In the case where \(\boldsymbol{X}^\intercal\boldsymbol{X}\) is not invertible, find three distinct values of \(\boldsymbol{\beta}\) that all achieve the same sum of squared residuals \(\boldsymbol{\varepsilon}^\intercal\boldsymbol{\varepsilon}\).
2 Probability and matrices
For this problem, assume that \(y_n = \beta_0 + x_n \beta_1 + \varepsilon_n\) for scalar \(x_n\) and some fixed \(\beta_0\) and \(\beta_1\). Assume that
- The residuals \(\varepsilon_n\) are IID with \(\mathbb{E}\left[\varepsilon_n\right] = 0\) and \(\mathrm{Var}\left(\varepsilon_n\right) = \sigma^2\).
- The regressors \(x_n\) are IID with \(\mathbb{E}\left[x_n\right] = \mu\) and \(\mathrm{Var}\left(x_n\right) = \nu^2\).
- The residuals are independent of the regressors.
(a)
Evaluate the following expressions. (You may need to remind yourself of the definition of conditional expectation and variance.)
- \(\mathbb{E}\left[y_n\right]\)
- \(\mathrm{Var}\left(y_n\right)\)
- \(\mathbb{E}\left[y_n \vert x_n\right]\)
- \(\mathrm{Var}\left(y_n \vert x_n\right)\)
(b)
Compute the following limits using the LLN, or say that the limit does not exist or is infinite.
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^N\varepsilon_n\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^N\varepsilon_n^2\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n^2\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n \varepsilon_n\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n y_n\)
(c)
Compute the following limits using the CLT, or say that the limit does not exist or is infinite.
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} \sum_{n=1}^N\varepsilon_n\)
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} \sum_{n=1}^Ny_n\)
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} \sum_{n=1}^N(y_n - (\beta_0 + x_n \beta_1))\)
(d)
Noting that this is simple linear regression, let \(\boldsymbol{X}\), \(\boldsymbol{Y}\), and \(\boldsymbol{\varepsilon}\) be as in the solution to question one above. Evaluate the following limits, or say that the limit does not exist or is infinite.
Here, \((\boldsymbol{A})_{ij}\) denotes the \(i,j\)–th entry of the matrix \(\boldsymbol{A}\). Let the regressor \(x_n\) be in the second column of \(\boldsymbol{X}\).
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \boldsymbol{1}^\intercal\boldsymbol{\varepsilon}\)
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} \boldsymbol{1}^\intercal\boldsymbol{\varepsilon}\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} (\boldsymbol{X}^\intercal\boldsymbol{X})_{11}\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} (\boldsymbol{X}^\intercal\boldsymbol{X})_{12}\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} (\boldsymbol{X}^\intercal\boldsymbol{X})_{22}\)
- \(\lim_{N \rightarrow \infty } (\boldsymbol{X}^\intercal\boldsymbol{X})_{11}\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} (\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta})^\intercal(\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta})\)
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} (\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta})^\intercal(\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta})\)
Hint: Write the matrix expressions as sums over \(n=1\) to \(N\).
3 One-hot encoding
Consider a one–hot encoding of a variable \(z_n\) that takes three distinct values, “a”, “b”, and “c”. That is, let
\[ \boldsymbol{x}_n = \begin{cases} \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} & \textrm{ when }z_n = a \\ \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} & \textrm{ when }z_n = b \\ \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} & \textrm{ when }z_n = c \\ \end{cases} \]
Let \(\boldsymbol{X}\) be the regressor matrix with \(\boldsymbol{x}_n^\intercal\) in row \(n\).
(a)
Let \(N_a\) be the number of observations with \(z_n\) = a, and let \(\sum_{n:z_n = a}\) denote a sum over rows with \(z_n\) = a, with analogous definitions for b and c. In terms of these quantities, write expressions for \(\boldsymbol{X}^\intercal\boldsymbol{X}\) and \(\boldsymbol{X}^\intercal\boldsymbol{Y}\).
(b)
When is \(\boldsymbol{X}^\intercal\boldsymbol{X}\) invertible? Explain intuitively why the regression problem cannot be solved when \(\boldsymbol{X}^\intercal\boldsymbol{X}\) is not invertible. Write an explicit expression for \((\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}\) when it is invertible.
(c)
Using your previous answer, show that the least squares vector \(\hat{\boldsymbol{\beta}}\) is the mean of \(y_n\) within distinct values of \(z_n\).
(d)
Suppose now you include a constant in the regression, so that
\[ y_n = \alpha + \boldsymbol{\beta}^\intercal\boldsymbol{x}_n + \varepsilon_n, \]
and let \(\boldsymbol{X}'\) denote the regressor matrix for this regression with coefficient vector \((\alpha, \boldsymbol{\beta}^\intercal)^\intercal\). Write an expression for \(\boldsymbol{X}'^\intercal\boldsymbol{X}'\) and show that it is not invertible.
(e)
Find three distinct values of \((\alpha, \boldsymbol{\beta}^\intercal)\) that all give the exact same fit \(\alpha + \boldsymbol{\beta}^\intercal\boldsymbol{x}_n\).
5 Matrix square roots
In the last homework, we proved that if \(\boldsymbol{A}\) is a square symmetric matrix with eigenvalues \(\boldsymbol{u}_p\) and eigenvectors \(\lambda_p\), then we can write \(\boldsymbol{A}= \boldsymbol{U}\Lambda \boldsymbol{U}^\intercal\), where \(\boldsymbol{U}= (\boldsymbol{u}_1 \ldots \boldsymbol{u}_p)\) has \(\boldsymbol{u}_p\) in its \(p\)–th column, and \(\Lambda\) is diagonal with \(\lambda_p\) in the \(p\)–th diagonal entry. We also have that the eigenvectors can be taken to be orthonormal without loss of generality.
We will additionally assume that \(\boldsymbol{A}= \boldsymbol{X}^\intercal\boldsymbol{X}\) for some (possibly non–square) matrix \(\boldsymbol{X}\).
Define \(\Lambda^{1/2}\) to be the diagonal matrix with \(\sqrt{\lambda_p}\) on the \(p\)–th diagonal.
- Prove that, since \(\boldsymbol{A}= \boldsymbol{X}^\intercal\boldsymbol{X}\), \(\lambda_p \ge 0\), and so \(\Lambda^{1/2}\) is always real–valued.
When the eignevalues are non–negative, we say that \(\boldsymbol{A}\) is “positive semi–definite.” (Hint: using the fact that \(\lambda_p = \boldsymbol{u}_p^\intercal\boldsymbol{A}\boldsymbol{u}_p\), show that \(\lambda_p\) is the square of something.) - Show that if we take \(\boldsymbol{Q}= \boldsymbol{U}\Lambda^{1/2} \boldsymbol{U}^\intercal\) then \(\boldsymbol{A}= \boldsymbol{Q}\boldsymbol{Q}^\intercal\). We say that \(\boldsymbol{Q}\) is a “matrix square root” of \(\boldsymbol{A}\).
- Show that we also have \(\boldsymbol{A}= \boldsymbol{Q}\boldsymbol{Q}\) (without the second transpose).
- Show that, if \(\boldsymbol{V}\) is any orthonormal matrix (a matrix with orthonormal columns), then \(\boldsymbol{Q}' = \boldsymbol{Q}\boldsymbol{V}\ne \boldsymbol{Q}\) also satisfies \(\boldsymbol{A}= \boldsymbol{Q}' \boldsymbol{Q}'^\intercal\). This shows that the matrix square root is not unique. (This fact can be thought of as the matrix analogue of the fact that \(4 = 2 \cdot 2\) but also \(4 = (-2) \cdot (-2)\)).
- Show that if \(\lambda_p > 0\) then \(\boldsymbol{Q}\) is invertible.
- Show that, if \(\boldsymbol{Q}\) is invertible, then the columns of \(\boldsymbol{X}\boldsymbol{Q}^{-1}\) are orthonormal. (Hint: show that \((\boldsymbol{X}\boldsymbol{Q}^{-1})^\intercal(\boldsymbol{X}\boldsymbol{Q}^{-1})\) is the identity matrix.)