STAT151A Homework 2 (with solutions)
This homework is due on Gradescope on Monday September 30th at 9pm.
1 Ordinary least squares in matrix form
Consider simple least squares regression \(y_n = \beta_1 + \beta_2 x_n + \varepsilon_n\), where \(x_n\) is a scalar. Assume that we have \(N\) datapoints. We showed directly that the least–squares solution is given by
\[ \begin{aligned} \hat{\beta}_1 ={}& \overline{y} - \hat{\beta}_2 \overline{x} &and\quad\quad \hat{\beta}_2 ={}& \frac{\overline{xy} - \overline{x} \, \overline{y}} {\overline{xx} - \overline{x}^2}. \end{aligned} \]
Let us re–derive this using matrix notation.
(a)
Write simple linear regression in the form \(\boldsymbol{Y}= \boldsymbol{X}\boldsymbol{\beta}+ \boldsymbol{\varepsilon}\). Be precise about what goes into each entry of \(\boldsymbol{Y}\), \(\boldsymbol{X}\), \(\boldsymbol{\beta}\), and \(\boldsymbol{\varepsilon}\). What are the dimensions of each?
(b)
We proved that the optimal \(\hat{\boldsymbol{\beta}}\) satisfies \(\boldsymbol{X}^\intercal\boldsymbol{X}\hat{\boldsymbol{\beta}}= \boldsymbol{X}^\intercal\boldsymbol{Y}\). Define the “barred quantities” \[ \begin{aligned} \overline{y} ={}& \frac{1}{N} \sum_{n=1}^Ny_n \\ \overline{x} ={}& \frac{1}{N} \sum_{n=1}^Nx_n \\ \overline{xy} ={}& \frac{1}{N} \sum_{n=1}^Nx_n y_n \\ \overline{xx} ={}& \frac{1}{N} \sum_{n=1}^Nx_n ^2, \end{aligned} \]
In terms of the barred quantities and the number of datpoints \(N\), write expressions for \(\boldsymbol{X}^\intercal\boldsymbol{X}\) and \(\boldsymbol{X}^\intercal\boldsymbol{Y}\).
(c)
When is \(\boldsymbol{X}^\intercal\boldsymbol{X}\) invertible? Write a formal expression in terms of the barred quantities. Interpret this condition intuitively in terms of the distribution of the regressors \(x_n\).
(d)
Using the formula for the inverse of a \(2\times 2\) matrix, find an expression for \(\hat{\boldsymbol{\beta}}\), and confirm that we get the same answer that we got by solving directly.
(e)
In the case where \(\boldsymbol{X}^\intercal\boldsymbol{X}\) is not invertible, find three distinct values of \(\boldsymbol{\beta}\) that all achieve the same sum of squared residuals \(\boldsymbol{\varepsilon}^\intercal\boldsymbol{\varepsilon}\).
Solutions
(a)
\[ \boldsymbol{X}= \begin{pmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_N \end{pmatrix} \quad\quad \boldsymbol{Y}= \begin{pmatrix} y_1 \\ \vdots \\ y_N \end{pmatrix} \quad\quad \boldsymbol{\varepsilon}= \begin{pmatrix} \varepsilon_1 \\ \vdots \\ \varepsilon_N \end{pmatrix} \quad\quad \boldsymbol{\beta}= \begin{pmatrix} \beta_1 \\ \beta_2 \end{pmatrix} \]
These are \(N \times 2\), \(N \times 1\), \(N \times 1\), and \(2 \times 1\) respectively.
(b)
\[ \boldsymbol{X}^\intercal\boldsymbol{X}= N \begin{pmatrix} 1 & \bar{x}\\ \bar{x}& \overline{xx} \end{pmatrix} \quad\quad \boldsymbol{X}^\intercal\boldsymbol{Y}= N \begin{pmatrix} \bar{y}\\ \overline{xy} \end{pmatrix} \]
(c)
\(\boldsymbol{X}\intercal\boldsymbol{X}\) is invertible if the determinant, \(N (\overline{xx} - \bar{x}\, \bar{x}) \ne 0\). This occurs when the sample variance of \(x_n\) is greater than zero.
(d)
\[ \begin{aligned} (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\boldsymbol{Y}={}& \frac{1}{N (\overline{xx} - \bar{x}\, \bar{x})} \begin{pmatrix} \overline{xx} & -\bar{x}\\ -\bar{x}& 1 \end{pmatrix} N \begin{pmatrix} \bar{y}\\ \overline{xy} \end{pmatrix} \\={}& \frac{1}{\overline{xx} - \bar{x}\, \bar{x}} \begin{pmatrix} \bar{y}\, \overline{xx} - \bar{x}\, \overline{xy} \\ -\bar{y}\, \bar{x}+ \overline{xy} \end{pmatrix} \end{aligned}. \]
We already have \(\hat{\beta}_2 = (\overline{xy} - \bar{y}\, \bar{x}) / (\overline{xx} - \bar{x}\, \bar{x})\) as expected. To see that \(\hat{\beta}_1\) is correct, write
\[ \begin{aligned} \bar{y}\, \overline{xx} - \bar{x}\, \overline{xy} ={}& \bar{y}\, \overline{xx} - \bar{y}\, \bar{x}\, \bar{x}+ \bar{y}\, \bar{x}\, \bar{x}- \bar{x}\, \overline{xy} \\={}& \bar{y}(\overline{xx} - \bar{x}\, \bar{x}) - \bar{x}(\overline{xy} - \bar{y}\, \bar{x}). \end{aligned} \]
Plugging this in gives
\[ \begin{aligned} \frac{\bar{y}\, \overline{xx} - \bar{x}\, \overline{xy}}{\overline{xx} - \bar{x}\, \bar{x}} ={}& \frac{\bar{y}(\overline{xx} - \bar{x}\, \bar{x}) - \bar{x}(\overline{xy} - \bar{y}\, \bar{x})} {\overline{xx} - \bar{x}\, \bar{x}} \\={}& \bar{y}- \bar{x}\hat{\beta}_2, \end{aligned} \]
as expected.
2 Probability and matrices
For this problem, assume that \(y_n = \beta_0 + x_n \beta_1 + \varepsilon_n\) for scalar \(x_n\) and some fixed \(\beta_0\) and \(\beta_1\). Assume that
- The residuals \(\varepsilon_n\) are IID with \(\mathbb{E}\left[\varepsilon_n\right] = 0\) and \(\mathrm{Var}\left(\varepsilon_n\right) = \sigma^2\).
- The regressors \(x_n\) are IID with \(\mathbb{E}\left[x_n\right] = \mu\) and \(\mathrm{Var}\left(x_n\right) = \nu^2\).
- The residuals are independent of the regressors.
(a)
Evaluate the following expressions. (You may need to remind yourself of the definition of conditional expectation and variance.)
- \(\mathbb{E}\left[y_n\right]\)
- \(\mathrm{Var}\left(y_n\right)\)
- \(\mathbb{E}\left[y_n \vert x_n\right]\)
- \(\mathrm{Var}\left(y_n \vert x_n\right)\)
(b)
Compute the following limits using the LLN, or say that the limit does not exist or is infinite.
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^N\varepsilon_n\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^N\varepsilon_n^2\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n^2\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n \varepsilon_n\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n y_n\)
(c)
Compute the following limits using the CLT, or say that the limit does not exist or is infinite.
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} \sum_{n=1}^N\varepsilon_n\)
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} \sum_{n=1}^Ny_n\)
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} \sum_{n=1}^N(y_n - (\beta_0 + x_n \beta_1))\)
(d)
Noting that this is simple linear regression, let \(\boldsymbol{X}\), \(\boldsymbol{Y}\), and \(\boldsymbol{\varepsilon}\) be as in the solution to question one above. Evaluate the following limits, or say that the limit does not exist or is infinite.
Here, \((\boldsymbol{A})_{ij}\) denotes the \(i,j\)–th entry of the matrix \(\boldsymbol{A}\). Let the regressor \(x_n\) be in the second column of \(\boldsymbol{X}\).
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \boldsymbol{1}^\intercal\boldsymbol{\varepsilon}\)
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} \boldsymbol{1}^\intercal\boldsymbol{\varepsilon}\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} (\boldsymbol{X}^\intercal\boldsymbol{X})_{11}\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} (\boldsymbol{X}^\intercal\boldsymbol{X})_{12}\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} (\boldsymbol{X}^\intercal\boldsymbol{X})_{22}\)
- \(\lim_{N \rightarrow \infty } (\boldsymbol{X}^\intercal\boldsymbol{X})_{11}\)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} (\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta})^\intercal(\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta})\)
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} (\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta})^\intercal(\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta})\)
Hint: Write the matrix expressions as sums over \(n=1\) to \(N\).
Solutions
(a)
- \(\mathbb{E}\left[y_n\right] = \beta_0 + \mu \beta_1\)
- \(\mathrm{Var}\left(y_n\right) = \beta_1^2 \nu^2 + \sigma^2\)
- \(\mathbb{E}\left[y_n \vert x_n\right] = \beta_0 + x_n \beta_1\)
- \(\mathrm{Var}\left(y_n \vert x_n\right) = \sigma^2\)
(b)
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^N\varepsilon_n = 0\) by the LLN
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^N\varepsilon_n^2 = \sigma^2\) by the LLN
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n = \mu\) by the LLN
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n^2 = \mu^2 + \nu^2\) by the LLN
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n \varepsilon_n = 0\) by the LLN
- \(\lim_{N \rightarrow \infty } \frac{1}{N} \sum_{n=1}^Nx_n y_n = \beta_0 \mu + \beta_1 (\nu^2 + \mu^2)\) by the LLN
(c)
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} \sum_{n=1}^N\varepsilon_n = \mathcal{N}\left(0,\sigma^2\right)\) by the CLT
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} \sum_{n=1}^Ny_n\) diverges if \(\beta_0 + \beta_1 \mu \ne 0\), and otherwise converges to \(\mathcal{N}\left(0, \beta_1^2 \nu^2 + \sigma^2\right)\).
- \(\lim_{N \rightarrow \infty } \frac{1}{\sqrt{N}} \sum_{n=1}^N(y_n - (\beta_0 + x_n \beta_1)) = \mathcal{N}\left(0, \sigma^2\right)\) by the CLT because \(y_n - (\beta_0 + x_n \beta_1) = \varepsilon_n\).
(d)
The key is to write these expressions as limits of sums, and then use the techniques given above.
- \(\frac{1}{N} \boldsymbol{1}^\intercal\boldsymbol{\varepsilon}= \frac{1}{N} \sum_{n=1}^N\varepsilon_n \rightarrow 0\)
- \(\frac{1}{\sqrt{N}} \boldsymbol{1}^\intercal\boldsymbol{\varepsilon}= \frac{1}{\sqrt{N}} \sum_{n=1}^N\varepsilon_n \rightarrow \mathcal{N}\left(0, \sigma^2\right)\)
- \(\frac{1}{N} (\boldsymbol{X}^\intercal\boldsymbol{X})_{11} = \frac{1}{N} \sum_{n=1}^N1 = 1\)
- \(\frac{1}{N} (\boldsymbol{X}^\intercal\boldsymbol{X})_{12} = \frac{1}{N} \sum_{n=1}^Nx_n \rightarrow \mu\)
- \(\frac{1}{N} (\boldsymbol{X}^\intercal\boldsymbol{X})_{22} = \frac{1}{N} \sum_{n=1}^Nx_n^2 \rightarrow \mu^2 + \nu^2\)
- (accidental repeat)
- \(\frac{1}{N} (\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta})^\intercal(\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta}) = \frac{1}{N} \boldsymbol{\varepsilon}^\intercal\boldsymbol{\varepsilon}= \frac{1}{N} \sum_{n=1}^N\varepsilon_n^2 \rightarrow \sigma^2\)
- \(\frac{1}{\sqrt{N}} (\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta})^\intercal(\boldsymbol{Y}- \boldsymbol{X}\boldsymbol{\beta}) = \frac{1}{\sqrt{N}} \sum_{n=1}^N\varepsilon_n^2 = \sqrt{N} \frac{1}{N} \sum_{n=1}^N\varepsilon_n^2 \rightarrow \infty\)
3 One-hot encoding
Consider a one–hot encoding of a variable \(z_n\) that takes three distinct values, “a”, “b”, and “c”. That is, let
\[ \boldsymbol{x}_n = \begin{cases} \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} & \textrm{ when }z_n = a \\ \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} & \textrm{ when }z_n = b \\ \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} & \textrm{ when }z_n = c \\ \end{cases} \]
Let \(\boldsymbol{X}\) be the regressor matrix with \(\boldsymbol{x}_n^\intercal\) in row \(n\).
(a)
Let \(N_a\) be the number of observations with \(z_n\) = a, and let \(\sum_{n:z_n = a}\) denote a sum over rows with \(z_n\) = a, with analogous definitions for b and c. In terms of these quantities, write expressions for \(\boldsymbol{X}^\intercal\boldsymbol{X}\) and \(\boldsymbol{X}^\intercal\boldsymbol{Y}\).
(b)
When is \(\boldsymbol{X}^\intercal\boldsymbol{X}\) invertible? Explain intuitively why the regression problem cannot be solved when \(\boldsymbol{X}^\intercal\boldsymbol{X}\) is not invertible. Write an explicit expression for \((\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}\) when it is invertible.
(c)
Using your previous answer, show that the least squares vector \(\hat{\boldsymbol{\beta}}\) is the mean of \(y_n\) within distinct values of \(z_n\).
(d)
Suppose now you include a constant in the regression, so that
\[ y_n = \alpha + \boldsymbol{\beta}^\intercal\boldsymbol{x}_n + \varepsilon_n, \]
and let \(\boldsymbol{X}'\) denote the regressor matrix for this regression with coefficient vector \((\alpha, \boldsymbol{\beta}^\intercal)^\intercal\). Write an expression for \(\boldsymbol{X}'^\intercal\boldsymbol{X}'\) and show that it is not invertible.
(e)
Find three distinct values of \((\alpha, \boldsymbol{\beta}^\intercal)\) that all give the exact same fit \(\alpha + \boldsymbol{\beta}^\intercal\boldsymbol{x}_n\).
Solutions
(a)
\[ \boldsymbol{X}^\intercal\boldsymbol{X}= \begin{pmatrix} N_a & 0 & 0 \\ 0 & N_b & 0 \\ 0 & 0 & N_c \\ \end{pmatrix} \quad\quad \boldsymbol{X}^\intercal\boldsymbol{Y}= \begin{pmatrix} \sum_{n:z_n = a} y_n \\ \sum_{n:z_n = b} y_n \\ \sum_{n:z_n = c} y_n \\ \end{pmatrix}. \]
(b)
It is invertbile as long as each of \(N_a\), \(N_b\), and \(N_c\) are nonzero. If there are no observations for a particular level, you of course cannot estimate its relationship with \(y_n\). When \(\boldsymbol{X}^\intercal\boldsymbol{X}\) is invertible, then
\[ \boldsymbol{X}^\intercal\boldsymbol{X}^{-1} = \begin{pmatrix} 1/N_a & 0 & 0 \\ 0 & 1/N_b & 0 \\ 0 & 0 & 1/N_c \\ \end{pmatrix} \]
(c)
By direct multiplication,
\[ \hat{\boldsymbol{\beta}}= \boldsymbol{X}^\intercal\boldsymbol{X}^{-1} \boldsymbol{X}^\intercal\boldsymbol{Y}= \begin{pmatrix} \frac{1}{N_a} \sum_{n:z_n = a} y_n \\ \frac{1}{N_b} \sum_{n:z_n = b} y_n \\ \frac{1}{N_c} \sum_{n:z_n = c} y_n \\ \end{pmatrix}. \]
(d)
\[ (\boldsymbol{X}')^\intercal\boldsymbol{X}' = \begin{pmatrix} N & N_a & N_b & N_c \\ N_a & N_a & 0 & 0 \\ N_b & 0 & N_b & 0 \\ N_c & 0 & 0 & N_c \\ \end{pmatrix}. \]
This is not invertible because the first column is the sum of the other three. Equivalently, \((\boldsymbol{X}')^\intercal\boldsymbol{X}' \boldsymbol{v}= \boldsymbol{0}\) where \(\boldsymbol{v}= (1, -1, -1, -1)^\intercal\).
(e)
Any line of the form
\[ (\alpha + \beta_1) z_{na} + (\alpha + \beta_2) z_{nb} + (\alpha + \beta_3) z_{nc} \]
will give the same fit. Three equivalent sets that also happen to solve the least squares problem are
\[ \begin{aligned} (\alpha, \beta_1, \beta_2, \beta_3) ={} \hat{\boldsymbol{\beta}}+ (0, 0, 0, 0) \\ (\alpha, \beta_1, \beta_2, \beta_3) ={} \hat{\boldsymbol{\beta}}+ (1, -1, -1, -1) \\ (\alpha, \beta_1, \beta_2, \beta_3) ={} \hat{\boldsymbol{\beta}}+ (2, -2, -2, -2). \end{aligned} \]
These are all of the form \(\hat{\boldsymbol{\beta}}+ C \boldsymbol{v}\) for \(C = 0\), \(C = 1\), and \(C = 2\), as they must be, where \(\boldsymbol{v}\) is the null vector from (d).
5 Matrix square roots
In the last homework, we proved that if \(\boldsymbol{A}\) is a square symmetric matrix with eigenvalues \(\boldsymbol{u}_p\) and eigenvectors \(\lambda_p\), then we can write \(\boldsymbol{A}= \boldsymbol{U}\Lambda \boldsymbol{U}^\intercal\), where \(\boldsymbol{U}= (\boldsymbol{u}_1 \ldots \boldsymbol{u}_p)\) has \(\boldsymbol{u}_p\) in its \(p\)–th column, and \(\Lambda\) is diagonal with \(\lambda_p\) in the \(p\)–th diagonal entry. We also have that the eigenvectors can be taken to be orthonormal without loss of generality.
We will additionally assume that \(\boldsymbol{A}= \boldsymbol{X}^\intercal\boldsymbol{X}\) for some (possibly non–square) matrix \(\boldsymbol{X}\).
Define \(\Lambda^{1/2}\) to be the diagonal matrix with \(\sqrt{\lambda_p}\) on the \(p\)–th diagonal.
- Prove that, since \(\boldsymbol{A}= \boldsymbol{X}^\intercal\boldsymbol{X}\), \(\lambda_p \ge 0\), and so \(\Lambda^{1/2}\) is always real–valued.
When the eignevalues are non–negative, we say that \(\boldsymbol{A}\) is “positive semi–definite.” (Hint: using the fact that \(\lambda_p = \boldsymbol{u}_p^\intercal\boldsymbol{A}\boldsymbol{u}_p\), show that \(\lambda_p\) is the square of something.) - Show that if we take \(\boldsymbol{Q}= \boldsymbol{U}\Lambda^{1/2} \boldsymbol{U}^\intercal\) then \(\boldsymbol{A}= \boldsymbol{Q}\boldsymbol{Q}^\intercal\). We say that \(\boldsymbol{Q}\) is a “matrix square root” of \(\boldsymbol{A}\).
- Show that we also have \(\boldsymbol{A}= \boldsymbol{Q}\boldsymbol{Q}\) (without the second transpose).
- Show that, if \(\boldsymbol{V}\) is any orthonormal matrix (a matrix with orthonormal columns), then \(\boldsymbol{Q}' = \boldsymbol{Q}\boldsymbol{V}\ne \boldsymbol{Q}\) also satisfies \(\boldsymbol{A}= \boldsymbol{Q}' \boldsymbol{Q}'^\intercal\). This shows that the matrix square root is not unique. (This fact can be thought of as the matrix analogue of the fact that \(4 = 2 \cdot 2\) but also \(4 = (-2) \cdot (-2)\)).
- Show that if \(\lambda_p > 0\) then \(\boldsymbol{Q}\) is invertible.
- Show that, if \(\boldsymbol{Q}\) is invertible, then the columns of \(\boldsymbol{X}\boldsymbol{Q}^{-1}\) are orthonormal. (Hint: show that \((\boldsymbol{X}\boldsymbol{Q}^{-1})^\intercal(\boldsymbol{X}\boldsymbol{Q}^{-1})\) is the identity matrix.)
Solutions
(1)
Suppose that \(\boldsymbol{v}\) is an eigenvector with eigenvalue \(\lambda\). Then
\[ \lambda \left\Vert\boldsymbol{v}\right\Vert^2 = \boldsymbol{v}^\intercal\boldsymbol{A}\boldsymbol{v}= \boldsymbol{v}^\intercal\boldsymbol{X}^\intercal\boldsymbol{X}\boldsymbol{v}= \left\Vert\boldsymbol{X}\boldsymbol{v}\right\Vert^2 \ge 0. \]
Since \(\left\Vert\boldsymbol{v}\right\Vert^2 \ge 0\), we must have \(\lambda \ge 0\) as well.
(2)
We already know that \(\Lambda^{1/2} \Lambda^{1/2} = \Lambda\) because the matrices are diagonal. Because the eigenvector matrices are orthonormal,
\[ \boldsymbol{Q}\boldsymbol{Q}^\intercal= \boldsymbol{U}\Lambda^{1/2} \boldsymbol{U}^\intercal\boldsymbol{U}\Lambda^{1/2} \boldsymbol{U}^\intercal= \boldsymbol{U}\Lambda^{1/2} \Lambda^{1/2} \boldsymbol{U}^\intercal= \boldsymbol{U}\Lambda \boldsymbol{U}^\intercal= \boldsymbol{A}. \]
(3)
Show directly that \(\boldsymbol{Q}\) is symmetric.
(4)
\[ \boldsymbol{Q}' \boldsymbol{Q}'^\intercal= \boldsymbol{Q}\boldsymbol{V}\boldsymbol{V}^\intercal\boldsymbol{Q}^\intercal= \boldsymbol{Q}\boldsymbol{Q}^\intercal= \boldsymbol{A}, \]
since \(\boldsymbol{V}^\intercal= \boldsymbol{V}^{-1}\) and left and right inverses are the same.
(5)
The inverse is given by the matrix with \(1/\sqrt{\lambda_p}\) on the diagonal, which can be verified by direct multiplication.
(5)
\[ \begin{aligned} (\boldsymbol{X}\boldsymbol{Q}^{-1})^\intercal(\boldsymbol{X}\boldsymbol{Q}^{-1}) ={}& (\boldsymbol{X}(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1/2})^\intercal\boldsymbol{X}(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1/2} \\={}& (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1/2} \boldsymbol{X}^\intercal\boldsymbol{X}(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1/2} \\={}& (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1/2} (\boldsymbol{X}^\intercal\boldsymbol{X})^{1/2} (\boldsymbol{X}^\intercal\boldsymbol{X})^{1/2} (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1/2} \\={}& \boldsymbol{I}. \end{aligned} \]