STAT151A Homework 1: Due Jan 26th
1 Simple regression in matrix form
Consider the simple linear model \(y_n = \beta_0 + \beta_1 z_n + \varepsilon_n\).
Let \(\bar{y}:= \frac{1}{N} \sum_{n=1}^Ny_n\) and \(\bar{z}:= \frac{1}{N} \sum_{n=1}^Nz_n\). Recall that the ordinary least squares estimates are given by \[ \begin{aligned} \hat{\beta}_1 = \frac{\frac{1}{N} \sum_{n=1}^N(z_n - \bar{z}) (y- \bar{y})}{\frac{1}{N} \sum_{n=1}^N(z_n - \bar{z})^2} \quad\textrm{and}\quad \hat{\beta}_0 = \bar{y}- \hat{\beta}_1 \bar{z}. \end{aligned} \]
(a)
Write the set of equations
\[ y_n = \beta_0 + \beta_1 z_n + \varepsilon_n \]
for \(n \in \{1, \ldots, N\}\) in matrix form. That is, let \(\boldsymbol{X}\) denote an \(N \times 2\) matrix, \(\boldsymbol{Y}\) and \(\boldsymbol{\varepsilon}\) denote \(N \times 1\) matrices, \(\boldsymbol{\b}= (\beta_0, \beta_1)^\intercal\), and express the matrices \(\boldsymbol{Y}\), \(\boldsymbol{X}\), and \(\boldsymbol{\varepsilon}\) in terms of the scalars \(y_n\), \(z_m\), and \(\varepsilon_n\) so that \(\boldsymbol{Y}= \boldsymbol{X}\boldsymbol{\b}+ \boldsymbol{\varepsilon}\) is equivalent to the set of regression equations.
(b)
Define \[ \begin{aligned} \overline{zz} := \frac{1}{N} \sum_{n=1}^Nz_n^2 \quad\textrm{and}\quad \overline{zy} := \frac{1}{N} \sum_{n=1}^Nz_n y_n \end{aligned} \]
Write an explict expressions for \(\boldsymbol{X}^\intercal\boldsymbol{X}\), \(\boldsymbol{X}^\intercal\boldsymbol{Y}\), and \(\left(\boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1}\), all in terms of \(\bar{y}\), \(\bar{z}\), \(\overline{zz}\), \(\overline{zy}\), and \(N\). Verify that the inverse is correct by direct multiplication.
(c)
Compute \((\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\boldsymbol{Y}\). Show that the first row is equal to \(\hat{\beta}_0\) and the second row is equal to \(\hat{\beta}_1\) as given by the ordinary least squares formula in the problem statement above.
2 Mean zero residuals.
Consider the model \(y_n = \beta z_n + \varepsilon_n\). Let \(\hat{\beta}\) denote the least squares estimator and \(\hat{\varepsilon}_n = y_n - \hat{\beta}z_n\).
(a)
Suppose \(z_n\) is not a constant. Is it necessarily the case that \(\frac{1}{N} \sum_{n=1}^N\hat{\varepsilon}_n = 0\)? Prove your answer.
(b)
Suppose \(z_n\) is a constant, but \(z_n \equiv 5\) for every \(n \in \{1, \ldots, N\}\). Is it necessarily the case that \(\frac{1}{N} \sum_{n=1}^N\hat{\varepsilon}_n = 0\)? Prove your answer.
(c)
Now the model \(y_n = \beta_1 z_{n1} + \beta_2 z_{n2} + \varepsilon_n\). Suppose that \(z_{n1} = 1\) is \(n\) is even, and is \(0\) otherwise. Similarly, suppose that \(z_{n2} = 1\) is \(n\) is odd, and is \(0\) otherwise. Let \(N\) be even. Is it necessarily the case that \(\frac{1}{N} \sum_{n=1}^N\hat{\varepsilon}_n = 0\)? Prove your answer.
3 Inner products and covariances
Let \(\boldsymbol{z}= (z_1, \ldots, z_N)\) and \(\boldsymbol{y}= (y_1, \ldots, y_N)\). Let \(\boldsymbol{X}\) denote an \(N \times P\) matrix whose \(n\)–th row is the transpose of the \(P\)-vector \(\boldsymbol{x}_n^\intercal\).
(Note: this question involves limits of random variables, and there are many distinct ways that random variables can converge to limits. If you’re familiar with these different modes of probabilisitic convergence, feel free to state what mode of convergence applies, but if you are not, don’t worry — modes of convergence will not matter much for this class, and you can state your result heuristically.)
For a set of quantities (numbers, vectors, pairs of vectors, etc), the “empirical distribution” over that set refers to drawing an element with replacement from the set with equal probability given to each entry. For example, if \(\mathcal{Z}'\) is a drawn from the empirical distribution over the set \(\{z_1, \ldots, z_N \}\), then \(\mathbb{P}\left(\mathcal{Z}' = z_n\right) = 1/N\) for each \(n\). Similarly, if \((\mathcal{Z}', \mathcal{Y}')\) is drawn from the empirical distribution over the pairs \(\{(z_1, y_1), \ldots, (z_N, y_N)\}\), then \(\mathbb{P}\left((\mathcal{Z}', \mathcal{Y}') = (z_n, y_n)\right) = 1/N\) for all \(n\).
(Hint: it may help to recall that the bootstrap uses draws from the empirical distribution, and that, in the empirical distribution, the elements of the set are fixed and not random.)
(a)
Let \((\mathcal{Z}', \mathcal{Y}')\) denote a draw from the empirical distribution over the set \(\{(y_1, z_1), \ldots, (y_N, z_N)\}\).
Prove that \(\frac{1}{N} \boldsymbol{z}^\intercal\boldsymbol{y}= \mathbb{E}\left[\mathcal{Z}' \mathcal{Y}'\right]\). Then prove that \(\frac{1}{N} \boldsymbol{1}^\intercal\boldsymbol{z}= \mathbb{E}\left[\mathcal{Z}'\right]\) as a special case.
(b)
Now suppose that the entries of \(\boldsymbol{z}\) are independent and identically distributed (IID) realizations of the random variable \(\mathcal{Z}\), and that the entries of \(\boldsymbol{y}\) are similarly IID realizations of a random variable \(\mathcal{Y}\). Assuming that \(\mathbb{E}\left[|\mathcal{Z}|\right] < \infty\) and \(\mathbb{E}\left[|\mathcal{Y}|\right] < \infty\), prove that
\[ \frac{1}{N} \boldsymbol{z}^\intercal\boldsymbol{y}\rightarrow \mathbb{E}\left[\mathcal{Z} \mathcal{Y}\right] \textrm{ as }N \rightarrow \infty \]
(Hint: don’t prove this from scratch, appeal to a probability theorem.)
(c)
Using only inner products involving \(\boldsymbol{y}\), \(\boldsymbol{z}\), and \(\boldsymbol{1}\), write an expression for \(\mathrm{Cov}\left(\mathcal{Y}', \mathcal{Z}'\right)\). Prove that the expression converges with probability one to \(\mathrm{Cov}\left(\mathcal{Y}, \mathcal{Z}\right)\). (Hint: again, use your previous results and a theorem from probability.)
(d)
Now, let \((\mathcal{X}', \mathcal{Y}')\) denote a draw from the empirical distribution over \(\{(x_1, y_1), \ldots, (x_N, y_N) \}\). (Recall that the vector \(x_n\) is a length–\(P\) column vector, and \(x_n^\intercal\) is the \(n\)–th row of the matrix \(\boldsymbol{X}\).)
\[ \begin{aligned} \frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}= \mathbb{E}\left[\mathcal{X}' \mathcal{X}'^\intercal\right] \quad\textrm{and}\quad \frac{1}{N} \boldsymbol{X}^\intercal y= \mathbb{E}\left[\mathcal{X}' \mathcal{Y}'\right]. \end{aligned} \]
(e)
Now, suppose that rows of \(\boldsymbol{X}\) are IID realizations of the random \(P\)–vector \(\mathcal{X}\), and that \(\mathbb{E}\left[|\mathcal{X}_p|\right] < \infty\) for each \(p \in \{ 1, \ldots, P \}\). Assume, as above, that \(\mathbb{E}\left[|\mathcal{Y}|\right] < \infty\).
Prove that, as \(N \rightarrow \infty\),
\[ \frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}\rightarrow \mathbb{E}\left[\mathcal{X} \mathcal{X}^\intercal\right] \quad\textrm{and}\quad \frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{Y}\rightarrow \mathbb{E}\left[\mathcal{X} \mathcal{Y}\right], \]
where both limits are with probability one.
(f)
Now assume that, for each \(p \in \{1, \ldots, P\}\) and \(q \in \{1, \ldots, P\}\), \(\mathbb{E}\left[\left|\mathcal{X}'_p\right| \left|\mathcal{X}'_q\right| \mathcal{Y}^2\right] < \infty\). Prove that, as \(N \rightarrow \infty\),
\[ \frac{1}{\sqrt{N}} \left( \boldsymbol{X}^\intercal\boldsymbol{Y}- \mathbb{E}\left[\boldsymbol{X}^\intercal\boldsymbol{Y}\right] \right) \rightarrow \mathcal{Z}, \]
where \(\mathcal{Z}\) is a multivariate normal random variable. What is the covariance of \(\mathcal{Z}\)? (Hint: again, appeal to a probability theorem.)