STAT151A Homework 5

Author

Your name here

1 Correlated variables in regressions

This problem is essentially just a special case of the “Eigendecomposition and covariances” problem from HW 3.

Suppose that \(\mathrm{Var}\left(x\right) = 1\), \(\mathrm{Var}\left(x'\right) = 1\), and \(\mathrm{Cov}\left(x, x'\right) = \rho\). Let \(\boldsymbol{x}= (x, x')^\intercal\).

(a) Show that \[ \mathrm{Cov}\left(\begin{pmatrix} x\\ x' \end{pmatrix} \right) = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}. \] Verify by direct calculation that the covariance matrix has eigenvectors \((1,1)^\intercal\) and \((1, -1)^\intercal\), with eigenvales \(1 + \rho\) and \(1 - \rho\), respectively.

(b) Suppose we run the regression \(y\sim \beta_1 x+ \beta_2 x'\). Using the eigenvectors from (a), define the new variables \[ \boldsymbol{z}:= \begin{pmatrix} s\\ d \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & -1 \\ \end{pmatrix} \begin{pmatrix} x\\ x' \end{pmatrix}. \] Here, \(s\) stands for “sum” and \(d\) for “difference.” Consider the regression \(y\sim \gamma_s s+ \gamma_d d\), with \(\hat{\boldsymbol{\gamma}}= (\gamma_s \gamma_d)^\intercal\). Using results from earlier in class, find an expression for \(\hat{\boldsymbol{\gamma}}\) in terms of \(\hat{\boldsymbol{\beta}}\), assuming \(\left|\rho\right| < 1\).

(c) Derive (b) a different way by writing

\[ \begin{aligned} \beta_1 x+ \beta_2 x' ={}& \beta_1 \left(\frac{1}{2}x+ \frac{1}{2}x+ \frac{1}{2}x' - \frac{1}{2} x' \right) + \beta_2 \left(\frac{1}{2}x' + \frac{1}{2}x' + \frac{1}{2}x- \frac{1}{2}x\right), \end{aligned} \] and grouping terms.

(d) Suppose that \(\mathrm{Cov}\left(\boldsymbol{x}\right) = \boldsymbol{U}\Lambda \boldsymbol{U}^\intercal\) is the eigendecomposition of a covariance matrix in general. Show that, in general, \(\mathrm{Cov}\left(\boldsymbol{U}^\intercal x\right)\) is diagonal. This implies that \(\mathrm{Cov}\left(s, d\right) = 0\).

Verify this directly: evaluate \(\mathrm{Var}\left(s\right)\), \(\mathrm{Var}\left(d\right)\), and \(\mathrm{Cov}\left(s, d\right)\) in terms of \(\rho\).

(e) Find the limiting value of the regressor second moment matrix \(\frac{1}{N} \boldsymbol{Z}^\intercal\boldsymbol{Z}\). Use this to argue that, for large \(N\), under homoskedasticity,

\[ \sqrt{N} \left( \begin{pmatrix} \hat{\gamma}_s\\ \hat{\gamma}_d \end{pmatrix} - \begin{pmatrix} \gamma^*_s\\ \gamma^*_d \end{pmatrix} \right) \sim \mathcal{N}\left( \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \frac{\sigma^2}{2} \begin{pmatrix} \frac{1}{1 + \rho} & 0 \\ 0 & \frac{1}{1 - \rho} \end{pmatrix} \right) \]

(f) Now, suppose that we have two highly positively correlated regressors, so that \(\rho \approx 1\). Using (e), argue that, for large \(N\),

  • \(\mathrm{Var}\left(\frac{1}{2} (\hat{\beta}_1 + \hat{\beta}_2)\right) \approx \sigma^2 / (4N)\), so the average of the two correlated regressors’ coefficients is well-estimated.
  • \(\mathrm{Var}\left(\frac{1}{2} (\hat{\beta}_1 - \hat{\beta}_2)\right) \approx \sigma^2 / (2N ( 1 - \rho))\), and \(1 - \rho \approx 0\), so the difference of the two correlated regressors’ coefficients is very poorly estimated.
  • The estimates \(\hat{\beta}_1, \hat{\beta}_2\) are highly negatively correlated when their regressors are higly positively correlated.

Finally, argue that all this is reversed if \(x\) and \(x'\) are almost perfectly negatively correlated.

(g) Suppose you are now given a totally generic regression, \(y\sim \boldsymbol{\beta}^\intercal\boldsymbol{r}\) for some regressors \(\boldsymbol{r}\). Select two of these regressors, which we assume are \(r_1\) and \(r_2\) without loss of generality, and assume the regression has a constant. Suppose you are interested in the joint distribution of \(\hat{\beta}_1\) and \(\hat{\beta}_2\). Convert this problem to an instance of our simple problem for large \(N\) above by

  1. Applying the FWL theorem to remove the effect of the other regressors, and
  2. Re-scaling the regressors so they have approximately unit variance.

In this sense, the above toy problem, together with the FWL theorem, gives a fairly complete intuition of the effect of correlated regressors on pairs of coefficient estimates.

2 Correlated variables in regressions for tests

Following the previous problem, with a change in notation, suppose that we have a two-variable regression (possibly after the application of the FWL theorem and normalization), \(y\sim \beta_1 x+ \beta_2 x'\). For this problem, we will assume that \(\rho \approx 0\), so that we assume that we can approximately substitute \(x\) and \(x'\) with one another.

Suppose that \(y_n = {\beta^{*}}x_n + \varepsilon_n \approx {\beta^{*}}x'_n + \varepsilon_n\), for mean zero \(\varepsilon_n\), and \(N\) is large. Suppose in this problem that \({\beta^{*}}= 1\); all that really matters is that it’s not zero.

As shown above, we know that \[ \begin{aligned} \hat{y}_n ={}& \hat{\beta}_1 x_n + \hat{\beta}_2 x'_n + \hat{\varepsilon}_n \\={}& \frac{1}{2} (\hat{\beta}_1 + \hat{\beta}_2) (x_n + x'_n) + \frac{1}{2} (\hat{\beta}_1 - \hat{\beta}_2) (x_n - x'_n) + \hat{\varepsilon}_n. \end{aligned} \]

We showed above that \(\frac{1}{2} (\hat{\beta}_1 - \hat{\beta}_2)\) is very poorly estimated, that \(\frac{1}{2} (\hat{\beta}_1 + \hat{\beta}_2) \approx {\beta^{*}}\) and is very well estimated, and that \(\frac{1}{2} (\hat{\beta}_1 + \hat{\beta}_2)\) and \(\frac{1}{2} (\hat{\beta}_1 - \hat{\beta}_2)\) are approximately independent. This means that, for any random choice of dataset,

  • \(\frac{1}{2} (\hat{\beta}_1 + \hat{\beta}_2) \approx {\beta^{*}}\)
  • \(\frac{1}{2} (\hat{\beta}_1 - \hat{\beta}_2)\) could be essentially anything.

For all the problems below, plot the answers on a 2D plot with \(\hat{\beta}_1\) on the x-axis and \(\hat{\beta}_2\) on the y-axis.

(a) Plot the 2d sampling distribution of \(\hat{\beta}_1\) and \(\hat{\beta}_2\).

(b) Note that \(\hat{\beta}_1 = \frac{1}{2} (\hat{\beta}_1 + \hat{\beta}_2) + \frac{1}{2} (\hat{\beta}_1 - \hat{\beta}_2) (x_n - x'_n)\). Is the marginal variance of \(\hat{\beta}_1\) very high or very small? Same question for \(\hat{\beta}_2\).

(c) Taking your answer for (b) into account, plot the acceptance region for a t-test of the null \(H_0: \beta_1 = 0\). How well-powered is your test against \({\beta^{*}}= 1\)? Same for \(H_0: \beta_2 = 0\).

(d) Plot the acceptance region for a t-test of the null \(H_0: \frac{1}{2} (\beta_1 + \beta_2) = 0\).
Hint: re-do (a), but if \({\beta^{*}}\) were \(0\) rather than \(1\). How well-powered is your test against \(\frac{1}{2} (\beta_1 + \beta_2) = {\beta^{*}}\)?

(e) Plot the acceptance region for a t-test of the null \(H_0: \frac{1}{2} (\beta_1 - \beta_2) = 0\). Hint: How would your answer in (a) have changed if we added soething of the form \({\beta^{*}}_d (x_n - x'_n)\) to \(y_n\)? In our setup, How well-powered is your test?

(f) Plot the acceptance region for an F-test of the null that \(\beta_1 = 0\) and \(\beta_2 = 0\). In our setup, how well-powered is this test?

3 References