STAT151A Homework 5 Asymptotics Solutions

Author

Your name here

\(\,\)

1 Reviewing the distribution of \(\hat{\beta}\) under different assumptions

This homework question will reference the following assumptions.

Regressor assumptions:

  • R1: The \(N \times P\) matrix \(\boldsymbol{X}\), which has \(\boldsymbol{x}_n^\intercal\) in the \(n\)–th row, is full column rank
  • R2: The regressors \(\boldsymbol{x}_n\) are deterministic, and \(\frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal\rightarrow \boldsymbol{\Sigma}_{\boldsymbol{X}}\), where \(\boldsymbol{\Sigma}_{\boldsymbol{X}}\) is positive definite
  • R3: The regressors \(\boldsymbol{x}_n\) are IID, with positive definite covariance \(\mathrm{Cov}\left(\boldsymbol{x}_n\right)\), and \(\frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal\rightarrow \mathbb{E}\left[\boldsymbol{x}_n \boldsymbol{x}_n^\intercal\right] = \boldsymbol{\Sigma}_{\boldsymbol{X}}\) in probability.

Model assumptions (for all \(n\)):

  • M1: There exists a \(\boldsymbol{\beta}\) such that \(y_n = \boldsymbol{\beta}^\intercal\boldsymbol{x}_n + \varepsilon_n\) for all \(n\)
  • M2: The residuals \(\varepsilon_n\) are IID with \(\varepsilon_n \vert \boldsymbol{x}_n \sim \mathcal{N}\left(0, \sigma^2\right)\) for some \(\sigma^2\)
  • M3: The residuals \(\varepsilon_n\) are independent with \(\mathbb{E}\left[\varepsilon_n | \boldsymbol{x}_n\right] = 0\) and \(\mathbb{E}\left[\varepsilon_n^2 | \boldsymbol{x}_n\right] = \sigma^2\)
  • M4: The residuals \(\varepsilon_n\) are independent with \(\mathbb{E}\left[\varepsilon_n | \boldsymbol{x}_n\right] = 0\) and \(\mathbb{E}\left[\varepsilon_n^2 | \boldsymbol{x}_n\right] = \sigma_n^2\)
  • M5: The pairs \((\boldsymbol{x}_n, y_n)\) are IID
  • M6: For all finite vectors \(\boldsymbol{v}\), \(\frac{1}{N} \sum_{n=1}^N\mathbb{E}\left[(y_n - \boldsymbol{v}^\intercal\boldsymbol{x}_n)^2 \boldsymbol{x}_n \boldsymbol{x}_n^\intercal\right] \rightarrow \boldsymbol{V}(\boldsymbol{v}) < \infty\), where each entry of the limiting matrix is finite. (The limit depends on \(\boldsymbol{v}\), but importantly \(\boldsymbol{V}(\boldsymbol{v})\) is finite for all finite \(\boldsymbol{v}\)).

For M2, M3, and M4 with \(\boldsymbol{x}_n\) is deterministic, take the conditioning to mean “for that value of \(\boldsymbol{x}_n\).”

For this homework, you may use the LLN, the CLT, the continuous mapping theorem, and properties of the multivariate normal distribution.

The term “limiting distribution” means the distribution that the quantity approaches as \(N \rightarrow \infty\).

Assume R1 for all questions.

\[ \def\opone{o_p(1)} \def\oone{o(1)} \]

In this below solutions, as \(N\rightarrow \infty\), I will use \(\oone\) to denote a term that goes to zero, and \(\opone\) to denote a term that goes to zero in probability . This notation will allow me to make limiting statements one line at a time.

  1. Find the distribution of \(\hat{\beta}\) under M1, M2, and R2.

\[ \begin{aligned} \hat{\boldsymbol{\beta}}={}& \left(\frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1} \frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{Y} & \textrm{(definition)}\\ ={}& \left(\frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1} \frac{1}{N} \boldsymbol{X}^\intercal\left(\boldsymbol{X}\boldsymbol{\beta}+ \boldsymbol{\varepsilon}\right) & \textrm{(M1)}\\ ={}& \boldsymbol{\beta}+ \left(\frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1} \frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{\varepsilon} & \textrm{(cancellation)}\\ ={}& \boldsymbol{\beta}+ \left(\boldsymbol{\Sigma}_{\boldsymbol{X}}+ \oone \right)^{-1} \frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{\varepsilon} & \textrm{(R2) applied to }\frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal\\ ={}& \boldsymbol{\beta}+ \left(\boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} + \oone \right) \opone & \textrm{(M2) and non-IID LLN applied to } \frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \varepsilon_n \\ \rightarrow {}& \boldsymbol{\beta}. \end{aligned} \]

So \(\hat{\boldsymbol{\beta}}\rightarrow \boldsymbol{\beta}\), a constant (that is, a degenerate distribution).

  1. Find the limiting distribution of \(\sqrt{N}(\hat{\beta}- \beta)\) under M1, M2, and R2.

From above, \[ \begin{aligned} \sqrt{N} \left( \hat{\boldsymbol{\beta}}- \boldsymbol{\beta}\right) ={}& \left(\frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1} \frac{1}{\sqrt{N}} \boldsymbol{X}^\intercal\boldsymbol{\varepsilon} & \textrm{({previous result})} \\ ={}& \left(\boldsymbol{\Sigma}_{\boldsymbol{X}}+ \oone \right)^{-1} \frac{1}{\sqrt{N}} \boldsymbol{X}^\intercal\boldsymbol{\varepsilon} & \textrm{(R2) applied to }\frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal\\ ={}& \left(\boldsymbol{\Sigma}_{\boldsymbol{X}}+ \oone \right)^{-1} \mathcal{N}\left(\boldsymbol{0}, \frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}\sigma^2\right) & \textrm{(M2) and normality of } \frac{1}{\sqrt{N}} \boldsymbol{x}_n \varepsilon_n \\ ={}& \left(\boldsymbol{\Sigma}_{\boldsymbol{X}}+ \opone \right)^{-1} \mathcal{N}\left(\boldsymbol{0}, \left( \boldsymbol{\Sigma}_{\boldsymbol{X}}+ \oone \right) \sigma^2\right) & \textrm{(R2) applied to }\frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal\\ \rightarrow {}& \boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \mathcal{N}\left(\boldsymbol{0}, \boldsymbol{\Sigma}_{\boldsymbol{X}}\sigma^2\right) & \\ \rightarrow {}& \mathcal{N}\left(\boldsymbol{0}, \boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \boldsymbol{\Sigma}_{\boldsymbol{X}}\boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \sigma^2\right) & \textrm{Normality} \\ ={}& \mathcal{N}\left(\boldsymbol{0}, \boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \sigma^2\right). \end{aligned} \]

  1. Find the limiting distribution of \(\sqrt{N}(\hat{\beta}- \beta)\) under M1, M2, and R3.

Same as (2), but

\[ \begin{aligned} \frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal=& \boldsymbol{\Sigma}_{\boldsymbol{X}}+ \opone. & \textrm{(R3) and the LLN} \end{aligned} \]

  1. Find the limiting distribution of \(\sqrt{N}(\hat{\beta}- \beta)\) under M1, M3, and R2.

Same as (2), but instead of using normality, \[ \frac{1}{\sqrt{N}} \boldsymbol{x}_n \varepsilon_n = \mathcal{N}\left(\boldsymbol{0}, \boldsymbol{\Sigma}_{\boldsymbol{X}}\sigma^2\right) + \opone \]

by the non-IID CLT, using the fact that

\[ \begin{aligned} \frac{1}{N} \sum_{n=1}^N\mathbb{E}\left[\boldsymbol{x}_n \varepsilon_n \right] ={}& \frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \mathbb{E}\left[\varepsilon_n\right] = \boldsymbol{0} & \textrm{(M3)}\\ \frac{1}{N} \sum_{n=1}^N\mathrm{Cov}\left(\boldsymbol{x}_n \varepsilon_n \right) ={}& \frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal\mathbb{E}\left[\varepsilon_n^2\right] \\ ={}& \sigma^2 \frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal & \textrm{(M3)}\\ ={}& \sigma^2 \left( \boldsymbol{\Sigma}_{\boldsymbol{X}}+ \oone \right). & \textrm{(R2)} \end{aligned} \]

  1. Find the limiting distribution of \(\sqrt{N}(\hat{\beta}- \beta)\) under M1, M3, and R3.

Combine the modifications of (3) and (4); otherwise the same as (2).

  1. Find the limiting distribution of \(\sqrt{N}(\hat{\beta}- \beta)\) under M1, M4, M6, and R2.

Same as (4), except we apply the non-IID CLT with

\[ \begin{aligned} \frac{1}{N} \sum_{n=1}^N\mathbb{E}\left[\boldsymbol{x}_n \varepsilon_n \right] ={}& \frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \mathbb{E}\left[\varepsilon_n \vert \boldsymbol{x}_n\right] = \boldsymbol{0} & \textrm{(M4)}\\ \frac{1}{N} \sum_{n=1}^N\mathrm{Cov}\left(\boldsymbol{x}_n \varepsilon_n\right) ={}& \frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal\mathbb{E}\left[\varepsilon_n^2 \vert \boldsymbol{x}_n\right] \\ ={}& \frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal\mathbb{E}\left[(y_n - \boldsymbol{\beta}^\intercal\boldsymbol{x}_n)^2\right] & \textrm{(M1)}\\ ={}& \boldsymbol{V}(\boldsymbol{\beta}) + \oone & \textrm{(M6)} \end{aligned} \]

  1. Find the limiting distribution of \(\sqrt{N}(\hat{\beta}- \beta)\) under M1, M4, M6, and R3.

Same as (6), but \[ \begin{aligned} \frac{1}{N} \sum_{n=1}^N\boldsymbol{x}_n \boldsymbol{x}_n^\intercal=& \boldsymbol{\Sigma}_{\boldsymbol{X}}+ \opone. & \textrm{(R3) and the LLN} \end{aligned} \]

and

\[ \begin{aligned} \frac{1}{N} \sum_{n=1}^N\mathrm{Cov}\left(\boldsymbol{x}_n \varepsilon_n\right) ={}& \mathbb{E}\left[\boldsymbol{x}_n \boldsymbol{x}_n^\intercal(y_n - \boldsymbol{\beta}^\intercal\boldsymbol{x}_n)^2\right] & \textrm{(M1)}\\ ={}& \boldsymbol{V}(\boldsymbol{\beta}) + \oone. & \textrm{(M6)} \end{aligned} \]

  1. Under M5, M6, and R3, identify a \(\beta^*\) such that \(\sqrt{N}(\hat{\beta}- \beta^*)\) converges to a nondegenerate, finite random variable, and find the limiting distribution.

\[ \begin{aligned} \hat{\boldsymbol{\beta}}={}& \left(\frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1} \frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{Y} & \textrm{(definition)}\\ ={}& \left(\boldsymbol{\Sigma}_{\boldsymbol{X}}+ \opone \right)^{-1} \frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{Y} & \textrm{(M5), (R3), the LLN}\\ ={}& \left(\boldsymbol{\Sigma}_{\boldsymbol{X}}+ \opone \right)^{-1} \left(\mathbb{E}\left[\boldsymbol{x}_n y_n\right] + \opone\right) & \textrm{(M5), (M6) with }\boldsymbol{v}= \boldsymbol{0}\textrm{, the LLN}\\ \rightarrow {}& \boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \mathbb{E}\left[\boldsymbol{x}_n y_n\right] =: \boldsymbol{\beta}^* \end{aligned} \]

We can then write

\[ \begin{aligned} \sqrt{N}(\hat{\boldsymbol{\beta}}- \boldsymbol{\beta}^*) ={}& \sqrt{N} \left(\left(\frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1} \frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{Y}- \boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \mathbb{E}\left[\boldsymbol{x}_n y_n\right] \right) \\={}& \left(\frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1} \sqrt{N} \left( \frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{Y}- \left(\frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}\right) \boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \mathbb{E}\left[\boldsymbol{x}_n y_n\right] \right) \\={}& \left(\frac{1}{N} \boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1} \frac{1}{\sqrt{N}} \sum_{n=1}^N\left( \boldsymbol{x}_n y_n - \boldsymbol{x}_n \boldsymbol{x}_n^\intercal\boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \mathbb{E}\left[\boldsymbol{x}_n y_n\right] \right) \\={}& \left(\boldsymbol{\Sigma}_{\boldsymbol{X}}+ o_p(1) \right)^{-1} \frac{1}{\sqrt{N}} \sum_{n=1}^N\left( \boldsymbol{x}_n y_n - \boldsymbol{x}_n \boldsymbol{x}_n^\intercal\boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \mathbb{E}\left[\boldsymbol{x}_n y_n\right] \right) & \textrm{(R3) and the LLN} \end{aligned} \]

Note that

\[ \mathbb{E}\left[\boldsymbol{x}_n y_n - \boldsymbol{x}_n \boldsymbol{x}_n^\intercal\boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \mathbb{E}\left[\boldsymbol{x}_n y_n\right]\right] = \mathbb{E}\left[\boldsymbol{x}_n y_n\right] - \mathbb{E}\left[\boldsymbol{x}_n \boldsymbol{x}_n^\intercal\right] \boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \mathbb{E}\left[\boldsymbol{x}_n y_n\right] = \mathbb{E}\left[\boldsymbol{x}_n y_n\right] - \boldsymbol{\Sigma}_{\boldsymbol{X}}\boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \mathbb{E}\left[\boldsymbol{x}_n y_n\right] = 0, \]

so \[ \begin{aligned} \frac{1}{\sqrt{N}} \sum_{n=1}^N\left( \boldsymbol{x}_n y_n - \boldsymbol{x}_n \boldsymbol{x}_n^\intercal\boldsymbol{\Sigma}_{\boldsymbol{X}}^{-1} \mathbb{E}\left[\boldsymbol{x}_n y_n\right] \right) ={}& \frac{1}{\sqrt{N}} \sum_{n=1}^N\boldsymbol{x}_n \left( y_n - \boldsymbol{x}_n^\intercal\boldsymbol{\beta}^* \right) \rightarrow \\={}& \mathcal{N}\left(\boldsymbol{0}, \mathbb{E}\left[ \boldsymbol{x}_n \boldsymbol{x}_n^\intercal\left( y_n - \boldsymbol{x}_n^\intercal\boldsymbol{\beta}^* \right)^2\right]\right). \end{aligned} \] Then rest is like (7).

  1. In any of the above settings, what is the limiting distribution of \((\hat{\beta}- \beta)\)? (The answer is the same no matter which setting you choose.)

They all converge to a constant.