STAT151A Homework 4: Due March 8th

Author

Your name here

\(\,\)

1 Chi squared random variables

Let \(s\sim \mathcal{\chi}^2_{K}\). Prove that

\(\mathbb{E}\left[s\right] = K\)
\(\mathrm{Var}\left(s\right) = 2 K\) (hint: if \(z\sim \mathcal{N}\left(0,\sigma^2\right)\), then \(\mathbb{E}\left[z^4\right] = 3\sigma^4\))
If \(a_n \sim \mathcal{N}\left(0,\sigma^2\right)\) IID for \(1,\ldots,N\), then \(\frac{1}{\sigma^2} \sum_{n=1}^Na_n^2 \sim \mathcal{\chi}^2_{N}\)
\(\frac{1}{K} s\rightarrow 1\) as \(K \rightarrow \infty\)
\(\frac{1}{\sqrt{K}} (s- K) \rightarrow \mathcal{N}\left(0, 2\right)\) as \(K \rightarrow \infty\)
Let \(\boldsymbol{a}\sim \mathcal{N}\left(\boldsymbol{0}, \boldsymbol{I}\right)\) where \(a\in \mathbb{R}^{K}\). Then \(\left\Vert\boldsymbol{a}\right\Vert_2^2 \sim \mathcal{\chi}^2_{K}\)
Let \(\boldsymbol{a}\sim \mathcal{N}\left(\boldsymbol{0}, \boldsymbol{\Sigma}\right)\) where \(a\in \mathbb{R}^{K}\). Then \(\boldsymbol{a}^\intercal\boldsymbol{\Sigma}^{-1} \boldsymbol{a}\sim \mathcal{\chi}^2_{K}\)

2 Predictive variance for different regressors

This question will take the residuals of the training data to be random, and will consider variablity under sampling of the training data. The regressors for both the training data and test data will be taken as fixed.

Let \(\boldsymbol{x}_n = (x_{n1}, x_{n2})^\intercal\) be IID normal regressors, with

\(\mathbb{E}\left[x_{n1}\right] = \mathbb{E}\left[x_{n2}\right] = 0\),
\(\mathrm{Var}\left(x_{n1}\right) = \mathrm{Var}\left(x_{n2}\right) = 1\), and
\(\mathrm{Cov}\left(x_{n1}, x_{n2}\right) = 0.99\).

(Note there is no intercept.)

Assume that \(y_n = \beta^\intercal\boldsymbol{x}_n + \varepsilon_n\) for some \(\beta\), and that the residuals \(\varepsilon_n\) are IID with mean \(0\), variance \(\sigma^2 = 2\), and are independent of \(\boldsymbol{x}_n\).

(a)

Find the limiting distribution of \(\sqrt{N}(\hat{\beta}- \beta)\).

(b)

Define the expected prediction error \[ \hat{y}_\mathrm{new}- \mathbb{E}\left[y_\mathrm{new}\right] := (\hat{\beta}- \beta)^\intercal x_\mathrm{new}, \]

and approximate the limiting variance \(\mathrm{Var}\left(\hat{y}_\mathrm{new}- \mathbb{E}\left[y_\mathrm{new}\right]\right)\) for the following new regression vectors:

\(x_\mathrm{new}= (1, 1)^\intercal\)
\(x_\mathrm{new}= (1, -1)^\intercal\)
\(x_\mathrm{new}= (100, 100)^\intercal\)
\(x_\mathrm{new}= (0, 0)^\intercal\)

You may assume that \(N\) is large, so that you can apply the CLT to \(\sqrt{N}(\hat{\beta}- \beta)\). Even with the CLT approximation your answer will depend on \(N\); just make this dependence explicit.

(c)

Why are some variances in (b) large and some small? Explain each in plain language and intuitive terms.

3 The sandwich covariance matrix under homoeskedasticity

For this problem, make the following assumptions.

The regressors are non-random, with \(\frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal\rightarrow \boldsymbol{\Sigma}_{\boldsymbol{X}}\) for positive definite \(\boldsymbol{\Sigma}_{\boldsymbol{X}}\)
The responses are \(y_n = \boldsymbol{\beta}^\intercal\boldsymbol{x}_n + \varepsilon_n\) for some unknown \(\boldsymbol{\beta}\)
The residuals are IID with \(\mathbb{E}\left[\varepsilon_n\right] = 0\) and \(\mathrm{Var}\left(\varepsilon_n\right) = \sigma^2\) (but not necessarily normal)

Under these assumptions, show that the sandwich covariance matrix and the standard covariance matrix converge to the same quantity. That is, show that

\[ \hat\Sigma_{sand} = N \left(\boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1} \left(\sum_{n=1}^Nx_n x_n^\intercal\hat{\varepsilon}_n^2\right) \left(\boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1} \rightarrow \boldsymbol{S} \quad\textrm{and}\quad \hat\Sigma_{h} = N \left(\boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1} \hat{\sigma}^2 \rightarrow \boldsymbol{S} \]

for the same \(\boldsymbol{S}\), where \(\hat{\sigma}^2 := \frac{1}{N} \sum_{n=1}^N\hat{\varepsilon}_n^2\).