STAT151A Homework 0 (prerequisites review).
1 Linear systems
Write the following system of equations in matrix form. Say whether each system has no solutions, a single solution, or an infinite number of solutions, and how you know.
The purpose of this exercise, in addition to reminding yourself how to write out equations in matrix form, is to remember that if there are \(P\) unknowns and \(Q\) equations:
- If \(P > Q\) there are an infinite number of solutions
- If \(P = Q\) there are either an infinite number or one solution
- If \(P < Q\) there are either an infinite number, one, or no solutions.
The key to note is that \(Q\) may be made “artifically large” by repeating equations or including redundant equations.
Formally, for a system of equations of the form \(\boldsymbol{A}b= c\), the singular value decomposition of \(\boldsymbol{A}\) is the easiest way to characterize the number of solutions, but that’s beyond the scope of the class.
- \[ \begin{aligned} a_1 + 2 a_2 ={}& 1 \\ a_1 + 3 a_2 ={}& 0 \\ \end{aligned} \]
One solution. (You can difference the equations to solve for \(a_2\) then plug in.)
The matrix solution is
\[ \begin{pmatrix} 1 & 2 \\ 1 & 3 \\ \end{pmatrix} \begin{pmatrix} a_1 \\ a_2 \\ \end{pmatrix} = \begin{pmatrix} 1 \\ 0 \\ \end{pmatrix}. \]
I won’t write out all the matrix solutions since they’re obvious once you get the pattern. The coefficients go into the first matrix, then a vector of the unknowns, then a vector of the right hand side terms.
- \[ \begin{aligned} a_1 + 2 a_2 + 3 a_3 ={}& 1 \\ a_1 + 3 a_2 + 3 a_3 ={}& 0 \\ 2 a_1 + 4 a_2 + 3 a_3 ={}& 5 \\ \end{aligned} \]
This code helps show that there is one solution with \(a_3 \ne 0\).
a <- matrix(c(1, 1, 2, 2, 3, 4, 3, 3, 3), ncol=3)
det(a)
solve(a, c(c(1, 0, 5)))- \[ \begin{aligned} a_1 + 2 a_2 + 3 a_3 ={}& 1 \\ a_1 + 3 a_2 + 3 a_3 ={}& 0 \\ \end{aligned} \]
There are two equations and three unknowns and so an infinite number of solutions. (E.g., starting from the solution to the previous one, add \(x\) to \(a_3\) and \(-x/3\) to \(a_1\).)
- \[ \begin{aligned} a_1 + 2 a_2 ={}& 1 \\ a_1 + 3 a_2 ={}& 0 \\ 2 a_1 + 4 a_2 ={}& 5 \\ \end{aligned} \]
There are three equations and two unknowns. There are no solutions. This is the same as (2) but with \(a_3\) removed, and the solution to (2) had \(a_3 \ne 0\).
- \[ \begin{aligned} a_1 + 2 a_2 ={}& 1 \\ \end{aligned} \]
There is one equation and two unknowns, so an infinite number of solutions.
- \[ \begin{aligned} a_1 + 2 a_2 ={}& 1 \\ a_1 + 2 a_2 ={}& 1 \\ a_1 + 2 a_2 ={}& 1 \\ a_1 + 2 a_2 ={}& 1 \\ a_1 + 2 a_2 ={}& 1 \\ \end{aligned} \]
Although there are five equations and two unknowns, they are all the same equation repeated, so there is still an infinite number of solutions.
- \[ \begin{aligned} a_1 + 2 a_2 ={}& 1 \\ a_1 + 2 a_2 ={}& 2 \\ a_1 + 2 a_2 ={}& 3 \\ a_1 + 2 a_2 ={}& 4 \\ a_1 + 2 a_2 ={}& 5 \\ \end{aligned} \]
There are no solutions. The reason this is not the same as the previous question is that here the right hand side is different.
- \[ \begin{aligned} a_1 + 2 a_2 ={}& 1 \\ a_1 + 3 a_2 ={}& 1 \\ a_1 + 4 a_2 ={}& 1 \\ a_1 + 5 a_2 ={}& 1 \\ a_1 + 6 a_2 ={}& 5 \\ \end{aligned} \]
There are no solutions. For the first four, we’d have to have \(a_2 = 0\), but then we’d have to have \(a_1 = 5\) and \(a_1 = 1\), which is impossible. It’s interesting to see that if the last right-hand-side term were \(1\) then there would in fact be a solution.
2 Dimensions of linear algebra expressions
For this problem, I will use the following definitions.
- \(\boldsymbol{X}\) denotes an \(N \times P\) matrix
- \(\boldsymbol{y}\) denotes an \(N\)–vector (i.e. an \(N \times 1\) matrix)
- \(\boldsymbol{1}\) denotes an \(N\)–vector containing all ones
- \(\boldsymbol{\beta}\) denotes a \(P\)–vector
I will take \(N > P > 1\). A transpose is denoted with a superscript \(\intercal\), and an inverse by a superscipt \(-1\). A matrix trace is denoted \(\mathrm{trace}\left(\right)\). You may assume that each matrix is full column rank.
For each expression, write the dimension of the result, or write “badly formed” if the expression is not a valid matrix expression.
Tip: For this assignment, and throughout the class, it can be very helpful to write the dimensions of a matrix or vector underneath to help make sure your matrix expressions are valid. For example, we can write \(\underset{PN}{\boldsymbol{X}^\intercal} \underset{NP}{\boldsymbol{X}}\), which we can see is valid because the \(N\)’s are next to one another. Similarly, we can see immediately that the expression \(\underset{NP}{\boldsymbol{X}} \underset{NP}{\boldsymbol{X}}\) is invalid because \(P\) is next to \(N\) in the matrix multiplication, which is not allowed.
- \(\boldsymbol{X}^\intercal\boldsymbol{y}\)
- \(\boldsymbol{X}^\intercal\boldsymbol{X}\)
- \(\boldsymbol{X}+ \boldsymbol{y}\)
- \(\boldsymbol{\beta}\boldsymbol{\beta}^\intercal\)
- \(\boldsymbol{\beta}^\intercal\boldsymbol{\beta}\)
- \(\boldsymbol{y}^\intercal\boldsymbol{y}\)
- \(\boldsymbol{X}^\intercal\boldsymbol{y}\) (Duplicate — that’s okay, just repeat the other answer)
- \(\left(\boldsymbol{X}^\intercal\boldsymbol{X}\right)^\intercal\)
- \(\left(\boldsymbol{X}^\intercal\boldsymbol{y}\right)^\intercal\)
- \(\left(\boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1}\)
- \(\left(\boldsymbol{X}^\intercal\boldsymbol{y}\right)^{-1}\)
- \(\boldsymbol{X}^{-1}\)
- \(\boldsymbol{y}^{-1} \boldsymbol{y}\)
- \((\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\)
- \((\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\boldsymbol{y}\)
- \(\left( (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\boldsymbol{y}\right)^\intercal\)
- \(\boldsymbol{y}^\intercal\boldsymbol{X}(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}\)
- \(\boldsymbol{X}\beta\)
- \(\boldsymbol{y}- \boldsymbol{X}\beta\)
- \(\boldsymbol{y}- \boldsymbol{X}^\intercal\beta\)
- \(\boldsymbol{y}^\intercal- \beta^\intercal\boldsymbol{X}^\intercal\)
- \(\boldsymbol{\beta}- (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\boldsymbol{y}\)
- \(\boldsymbol{X}\left( \boldsymbol{\beta}- (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\boldsymbol{y}\right)\)
- \(\boldsymbol{1}^\intercal\boldsymbol{y}\)
- \(\boldsymbol{y}- (\boldsymbol{1}^\intercal\boldsymbol{y}) \boldsymbol{y}\)
- \(\boldsymbol{X}(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\)
- \(\boldsymbol{X}^\intercal(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}\)
- \(\boldsymbol{X}^\intercal(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}\) (Duplicate — that’s okay, just repeat the other answer)
- \(\boldsymbol{\beta}^\intercal(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}\boldsymbol{\beta}\)
- \(\left( \boldsymbol{\beta}^\intercal(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}\boldsymbol{\beta}\right)^{-1}\)
- \(\mathrm{trace}\left( \boldsymbol{\beta}^\intercal(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}\boldsymbol{\beta}\right)\)
- \(\mathrm{trace}\left( (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}\boldsymbol{\beta}\boldsymbol{\beta}^\intercal\right)\)
- \(\boldsymbol{X}^\intercal\boldsymbol{y}: P \times 1\)
- \(\boldsymbol{X}^\intercal\boldsymbol{X}: P \times P\)
- \(\boldsymbol{X}+ \boldsymbol{y}: \textrm{Invalid}\)
- \(\boldsymbol{\beta}\boldsymbol{\beta}^\intercal: P \times P\)
- \(\boldsymbol{\beta}^\intercal\boldsymbol{\beta}: 1 \times 1\)
- \(\boldsymbol{y}^\intercal\boldsymbol{y}: 1 \times 1\)
- \(\boldsymbol{X}^\intercal\boldsymbol{y}\) (Duplicate)
- \(\left(\boldsymbol{X}^\intercal\boldsymbol{X}\right)^\intercal: P \times P\)
- \(\left(\boldsymbol{X}^\intercal\boldsymbol{y}\right)^\intercal: 1 \times P\)
- \(\left(\boldsymbol{X}^\intercal\boldsymbol{X}\right)^{-1}: P \times P\)
- \(\left(\boldsymbol{X}^\intercal\boldsymbol{y}\right)^{-1}: \textrm{Invalid (only square matrices are invertible)}\)
- \(\boldsymbol{X}^{-1}: \textrm{Invalid (only square matrices are invertible)}\)
- \(\boldsymbol{y}^{-1} \boldsymbol{y}: \textrm{Invalid (only square matrices are invertible)}\)
- \((\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal: P \times N\)
- \((\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\boldsymbol{y}: P \times 1\)
- \(\left( (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\boldsymbol{y}\right)^\intercal: 1 \times P\)
- \(\boldsymbol{y}^\intercal\boldsymbol{X}(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}: 1 \times P\)
- \(\boldsymbol{X}\beta: N \times 1\)
- \(\boldsymbol{y}- \boldsymbol{X}\beta: N \times 1\)
- \(\boldsymbol{y}- \boldsymbol{X}^\intercal\beta: \textrm{Invalid}\)
- \(\boldsymbol{y}^\intercal- \beta^\intercal\boldsymbol{X}^\intercal: 1 \times N\)
- \(\boldsymbol{\beta}- (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\boldsymbol{y}: P \times 1\)
- \(\boldsymbol{X}\left( \boldsymbol{\beta}- (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal\boldsymbol{y}\right): N \times 1\)
- \(\boldsymbol{1}^\intercal\boldsymbol{y}: 1 \times 1\)
- \(\boldsymbol{y}- (\boldsymbol{1}^\intercal\boldsymbol{y}) \boldsymbol{y}: N \times 1\)
- \(\boldsymbol{X}(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}^\intercal: N \times N\)
- \(\boldsymbol{X}^\intercal(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}: \textrm{Invalid}\)
- \(\boldsymbol{X}^\intercal(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1} \boldsymbol{X}\) (Duplicate)
- \(\boldsymbol{\beta}^\intercal(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}\boldsymbol{\beta}: 1 \times 1\)
- \(\left( \boldsymbol{\beta}^\intercal(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}\boldsymbol{\beta}\right)^{-1}: 1 \times 1\)
- \(\mathrm{trace}\left( \boldsymbol{\beta}^\intercal(\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}\boldsymbol{\beta}\right): 1 \times 1\)
- \(\mathrm{trace}\left( (\boldsymbol{X}^\intercal\boldsymbol{X})^{-1}\boldsymbol{\beta}\boldsymbol{\beta}^\intercal\right): 1 \times 1\)
3 Orthonormal vectors and bases
For this problem, assume that \(\boldsymbol{u}= (u_1, u_2)^\intercal\) and \(\boldsymbol{v}= (v_1, v_2)^\intercal\) are orthonormal. That is, \(\boldsymbol{u}^\intercal\boldsymbol{u}= \boldsymbol{v}^\intercal\boldsymbol{v}= 1\), and \(\boldsymbol{u}^\intercal\boldsymbol{v}= 0\). Let \(\boldsymbol{a}= (a_1, a_2)^\intercal\) denote a generic \(2\)–dimensional vector.
- Write an expression for the length of \(\boldsymbol{a}\) in terms of its entries \(a_1\) and \(a_2\).
- Write an expression for the length of \(\boldsymbol{a}\), using only matrix operations (i.e., without explicit reference to the entries of \(\boldsymbol{a}\)).
- Write an explicit expression for a vector pointing in the same direction as \(\boldsymbol{a}\) but with unit length. (Hint: show that, for a scalar \(\alpha\), the length of \(\alpha \boldsymbol{a}\) is \(\alpha\) times the length of \(\boldsymbol{a}\), and then make a clever choice for \(\alpha\).)
- Write an explicit expression for a vector pointing in the same direction as \(\boldsymbol{v}\) but with the same length as \(\boldsymbol{a}\).
- Suppose that I tell you that \(\boldsymbol{a}= \alpha \boldsymbol{u}+ \gamma \boldsymbol{v}\). Find an explicit expression for \(\alpha\) in terms of \(\boldsymbol{a}\) and \(\boldsymbol{u}\) alone.
- Find explicit expressions for scalars \(\alpha\) and \(\gamma\) such that \(\boldsymbol{a}= \alpha \boldsymbol{u}+ \gamma \boldsymbol{v}\).
- Let \(\begin{pmatrix} \boldsymbol{u}& \boldsymbol{v}\end{pmatrix} := \begin{pmatrix} u_1 & v_1 \\ u_2 & v_2\end{pmatrix}\) denote the \(2 \times 2\) matrix with \(\boldsymbol{u}\) in the first column and \(\boldsymbol{v}\) in the second column. Show that \(\begin{pmatrix} \boldsymbol{u}& \boldsymbol{v}\end{pmatrix}^\intercal= \begin{pmatrix} \boldsymbol{u}& \boldsymbol{v}\end{pmatrix}^{-1}\).
- Observe that \(\alpha \boldsymbol{u}+ \gamma \boldsymbol{v}= \begin{pmatrix} \boldsymbol{u}& \boldsymbol{v} \end{pmatrix} \begin{pmatrix} \alpha \\ \gamma \end{pmatrix}.\) Using this, write an explicit expression for the coefficient vector \((\alpha, \gamma)^\intercal\) in terms of \(\begin{pmatrix} \boldsymbol{u}& \boldsymbol{v}\end{pmatrix}\) and \(\boldsymbol{a}\).
- Show that \(\begin{pmatrix} \boldsymbol{u}& \boldsymbol{v}\end{pmatrix}^{-1} = \begin{pmatrix} \boldsymbol{u}& \boldsymbol{v}\end{pmatrix}^\intercal\).
- Suppose I tell you that that \(\boldsymbol{u}= (1, 0)^\intercal\). In terms of \(a_1\) and \(a_2\), what is \(\alpha\) in the decomposition \(\boldsymbol{a}= \alpha \boldsymbol{u}+ \gamma \boldsymbol{v}\)?
- \(\sqrt{a_1^2 + a_2^2}\)
- \(\sqrt{\boldsymbol{a}^\intercal\boldsymbol{a}}\)
- \(\boldsymbol{a}/ \sqrt{\boldsymbol{a}^\intercal\boldsymbol{a}}\). From now on, \(\left\Vert\boldsymbol{a}\right\Vert := \sqrt{\boldsymbol{a}^\intercal\boldsymbol{a}}\)
- \(\left\Vert\boldsymbol{a}\right\Vert \boldsymbol{v}\) (since \(\boldsymbol{v}\) has unit length already)
- \(\alpha = \boldsymbol{a}^\intercal\boldsymbol{u}\)
- \(\gamma = \boldsymbol{a}^\intercal\boldsymbol{v}\) and \(\alpha = \boldsymbol{a}^\intercal\boldsymbol{u}\)
- \(\begin{pmatrix} \boldsymbol{u}& \boldsymbol{v}\end{pmatrix}^\intercal\begin{pmatrix} \boldsymbol{u}& \boldsymbol{v}\end{pmatrix} = \begin{pmatrix} \boldsymbol{u}^\intercal\boldsymbol{u}& \boldsymbol{u}^\intercal\boldsymbol{v}\\ \boldsymbol{v}^\intercal\boldsymbol{u}& \boldsymbol{v}^\intercal\boldsymbol{v}\end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\). Since inverses are unique, the transpose must be the inverse.
- \(\begin{pmatrix} \boldsymbol{u}& \boldsymbol{v}\end{pmatrix}^\intercal\boldsymbol{a}= \begin{pmatrix}\alpha \\ \gamma \end{pmatrix}\)
- Repeated question (sorry)
- \(\alpha = a_1\)
4 Eigenvalues and eigenvectors of square symmetric matrices
Let \(\boldsymbol{A}\) denote a \(P \times P\) symmetric, square matrix. Recall that an “eigenvalue” \(\lambda_k\) of and its associated “eigenvector” \(\boldsymbol{u}_k\) of \(\boldsymbol{A}\) satisfy \(\boldsymbol{A}\boldsymbol{u}_k = \lambda_k \boldsymbol{u}_k\). (In this definition, \(\boldsymbol{u}_k\) must be non–degenerate, i.e., it must have some nonzero entries.)
Let \(\boldsymbol{U}= (\boldsymbol{u}_1 \ldots \boldsymbol{u}_P)\) denote the \(P \times P\) matrix with eigenvector \(\boldsymbol{u}_k\) in the \(k\)–th column, and let \(\Lambda\) denote the \(P \times P\) diagonal matrix with \(\lambda_k\) in the \(k\)–th diagonal entry. Let \(\boldsymbol{a}\) denote a generic \(P\)–vector.
- If \(\boldsymbol{A}\) is the identity matrix (i.e., the matrix with ones on the diagonal and zeros elsewhere), what are its eigenvalues?
They are all \(1\).
- If \(\boldsymbol{A}\) is the zero matrix (i.e., the matrix containing only zeros), what are its eigenvalues?
They are all \(0\).
- If \(\boldsymbol{A}\) is a diagonal matrix with the entries \(a_1, \ldots, a_P\) on the diagonal, what are its eigenvalues?
They are the values \(a_p\). (The eigenvectors are vectors with exactly one non-zero element.)
- Let us prove that the eigenvectors can be taken to be unit vectors without loss of generality.
- Show that, if \(\boldsymbol{u}_k\) is an eigenvector, then \(\alpha \boldsymbol{u}_k\) is also an eigenvector with the same eigenvalue as \(\boldsymbol{u}_k\) for any scalar \(\alpha \ne 0\).
- In particular, show that \(\boldsymbol{u}_k' = \boldsymbol{u}_k / \sqrt{\boldsymbol{u}_k^\intercal\boldsymbol{u}_k}\) is also an eigenvector with eigenvalue \(\lambda_k\), and that \((\boldsymbol{u}_k')^\intercal\boldsymbol{u}'_k = 1\).
\(\boldsymbol{A}(\alpha \boldsymbol{u}_k) = \alpha \boldsymbol{A}\boldsymbol{u}_k = \alpha \lambda_k \boldsymbol{u}_k = \lambda_k (\alpha \boldsymbol{u}_k)\). Then take \(\alpha = 1 / \left\Vert\boldsymbol{u}_k\right\Vert\).
- Let us prove that, for a general symmetric matrix, eigenvectors with distinct eigenvalues are orthogonal. That is, if \(\lambda_k \ne \lambda_j\) for some \(k \ne j\), we will show that \(\boldsymbol{u}_k^\intercal\boldsymbol{u}_j = 0\). Carefully justify each step in the following proof.
- We have \(\boldsymbol{u}_j^\intercal\boldsymbol{A}\boldsymbol{u}_k = (\boldsymbol{u}_j^\intercal\boldsymbol{A}\boldsymbol{u}_k)^\intercal\). (Why?)
- We also have \((\boldsymbol{u}_j^\intercal\boldsymbol{A}\boldsymbol{u}_k)^\intercal= \boldsymbol{u}_k^\intercal\boldsymbol{A}^\intercal\boldsymbol{u}_j\). (Why?)
- We then have \(\boldsymbol{u}_j^\intercal\boldsymbol{A}\boldsymbol{u}_k = \boldsymbol{u}_k^\intercal\boldsymbol{A}\boldsymbol{u}_j\). (Why?)
- We then have \(\lambda_k \boldsymbol{u}_j^\intercal\boldsymbol{u}_k = \lambda_j \boldsymbol{u}_k^\intercal\boldsymbol{u}_j\). (Why?)
- We then have \(\boldsymbol{u}_k^\intercal\boldsymbol{u}_j = 0\). (Why?)
- Bonus question: If the eigenvalues are not distinct, then the corresponding eigenvectors may not be orthogonal. However, one can always find an orthogonal set of eigenvectors for each repeated eigenvalue, though we won’t prove this here. See a linear algebra text for the proof, or try yourself! Hint: if \(\lambda_k = \lambda_j\) for some \(k \ne j\), then \(\alpha \boldsymbol{u}_k + \gamma \boldsymbol{u}_j\) is also an eigenvector with eigenvalue \(\lambda_k\) for any \(\alpha\) and \(\gamma\).
- The transpose of a scalar is the same scalar
- In general, \((\boldsymbol{R}\boldsymbol{S})^\intercal= \boldsymbol{S}^\intercal\boldsymbol{R}^\intercal\) (transposes reverse the order of matrix multiplication). Apply this rule twice.
- Combine the first two results
- \(\boldsymbol{u}_k\) and \(\boldsymbol{u}_j\) are eigenvectors
- Because \(\lambda_k \ne \lambda_j\), and so the only way the preceding equation can be true is if \(\boldsymbol{u}_k^\intercal\boldsymbol{u}_j = 0\).
- Suppose that each \(\lambda_k \ne 0\). Show that the inverse of \(\Lambda\) is given by the diagonal matrix with \(1 / \lambda_k\) in the \(k\)–th diagonal entry and zero elsewhere.
\[ \begin{pmatrix} 1 / \lambda_1 & 0 & \ldots & 0 \\ 0 & 1 / \lambda_2 & 0 \ldots & 0 \\ & & \ddots & \\ & & \ldots 0 & 1 / \lambda_P \\ \end{pmatrix} \Lambda = \begin{pmatrix} \lambda_1 / \lambda_1 & 0 & \ldots & 0 \\ 0 & \lambda_2 / \lambda_2 & 0 \ldots & 0 \\ & & \ddots & \\ & & \ldots 0 & \lambda_P / \lambda_P \\ \end{pmatrix} = \boldsymbol{I}\]
- Suppose that \(\lambda_k = 0\) for some \(k\). Show that \(\Lambda\) is not invertible. Hint: find a vector \(\boldsymbol{b}\) such that \(\Lambda \boldsymbol{a}= \boldsymbol{b}\) has no solution. From this it would follow that \(\Lambda\) is not invertible by contradiction for, if \(\Lambda\) were invertible, \(\boldsymbol{a}= \Lambda^{-1} \boldsymbol{b}\) would be a solution.
Set \(\boldsymbol{b}\) to be zeros except in the \(k\)–th entry. Since \(\Lambda \boldsymbol{a}\) is zero in the \(k\)–th entry no matter what \(\boldsymbol{a}\) is, and so not equal to \(\boldsymbol{b}\), \(\Lambda\) cannot be invertible.
- Assume that \(\boldsymbol{A}\) has \(P\) orthonormal eigenvectors (it always does; we have proved some parts of this assertion above). We will prove that we then have \(\boldsymbol{A}= \boldsymbol{U}\Lambda \boldsymbol{U}^\intercal\). Carefully justify each step in the following proof.
- For any \(\boldsymbol{a}\), we can write \(\boldsymbol{a}= \sum_{p=1}^P \alpha_p \boldsymbol{u}_p\) for some scalars \(\alpha_p\). (Why?)
- Using the previous expansion, we thus have \(\boldsymbol{A}\boldsymbol{a}= \sum_{p=1}^P \lambda_p \alpha_p \boldsymbol{u}_p\). (Why?)
- Let \(\boldsymbol{\alpha}= (\alpha_1, \ldots, \alpha_P)^\intercal\). Then \(\boldsymbol{a}= \boldsymbol{U}\boldsymbol{\alpha}\). (Why?)
- Similarly, we have \(\boldsymbol{A}\boldsymbol{a}= \boldsymbol{U}\Lambda \boldsymbol{\alpha}\). (Why?)
- We also have \(\boldsymbol{u}_p^\intercal\boldsymbol{a}= \alpha_p\). (Why?)
- We then have \(\boldsymbol{\alpha}= \boldsymbol{U}^\intercal\boldsymbol{a}\). (Why?)
- Combining, we have \(\boldsymbol{A}\boldsymbol{a}= \boldsymbol{U}\Lambda \boldsymbol{U}^\intercal\boldsymbol{a}\). (Why?)
- Since the preceding expression is true for any \(\boldsymbol{a}\), we must have \(\boldsymbol{A}= \boldsymbol{U}\Lambda \boldsymbol{U}^\intercal\). (Why? Hint: if you take \(\boldsymbol{a}\) to be \(1\) in the first entry and zeros elsewhere, then we have that the first columns match. Continue in this fashion to show that each entry of the matrices are the same.)
- The \(P\) eigenvectors span \(P\) dimensional space, so every \(P\)–vector can be written as a linear combination of them.
- \(\boldsymbol{A}\boldsymbol{a}= \boldsymbol{A}\sum_{p=1}^P \alpha_p \boldsymbol{u}_p = \sum_{p=1}^P \alpha_p \boldsymbol{A}\boldsymbol{u}_p = \sum_{p=1}^P \alpha_p \lambda_p \boldsymbol{u}_p\) because the \(\boldsymbol{u}_p\) are eigenvectors.
- This is just a matrix way of writing the the first result \(\sum_{p=1}^P \alpha_p \boldsymbol{u}_p\).
- This is just a matrix way of writing the the second result (since \(\Lambda\) is diagonal, its diagonal entries just multiply each corresponding row of \(\boldsymbol{a}\)).
- We have \(\boldsymbol{u}_p^\intercal\boldsymbol{a}= \sum_{q=1}^P \alpha_q \boldsymbol{u}_p^\intercal\boldsymbol{u}_q = \alpha_p\) by orthogonality
- This is the matrix version of the preceding result stacked row–wise
- Combining, \(\boldsymbol{A}\boldsymbol{a}= \boldsymbol{U}\Lambda \boldsymbol{\alpha}= \boldsymbol{U}\Lambda \boldsymbol{U}^\intercal\boldsymbol{a}\).
- If \(\boldsymbol{a}\) is 1 in row \(k\) and zeros elsewhere, it says that the \(k\)–th column of \(\boldsymbol{A}\) must be the same as the \(k\)–th column of \(\boldsymbol{U}\Lambda \boldsymbol{U}^\intercal\). Since this is true for each \(k\), the entire matrices must be the same.
5 Statistical asymptotics
For this problem, suppose that \(x_n\) are independent and identically distributed random variables, with \(\mathbb{E}\left[x_n\right] = 3\), and \(\mathrm{Var}\left(x_n\right) = 4\). You should not assume that the \(x_n\) are normally distributed.
Recall that \(\mathrm{Var}\left(x_n\right) = \mathbb{E}\left[x_n^2\right] - \mathbb{E}\left[x_n\right]^2\). Also recall that, for any scalar \(\alpha\), \(\mathbb{E}\left[\alpha x_n\right] = \alpha \mathbb{E}\left[x_n\right]\), and \(\mathrm{Var}\left(\alpha x_n\right) = \mathbb{E}\left[(\alpha x_n)^2\right] - \mathbb{E}\left[\alpha x_n\right]^2 = \alpha^2 \mathrm{Var}\left(x_n\right)\).
For each limiting statement, state clearly whether your answer is a constant, a random variable, or that the limit does not exist. If the limit is a random variable, give its distribution. If it is a constant, give its value. If it does not exist, argue why. You may refer to known results from probability theory, particularly the law of large numbers and the central limit theorem.
- For any particular (finite) value of \(N\), is \(\frac{1}{N} \sum_{n=1}^Nx_n\) a constant or a random variable?
- For any particular (finite) value of \(N\), is \(\frac{1}{N} \sum_{n=1}^N\mathbb{E}\left[x_n\right]\) a constant or a random variable?
- Compute the expectation \(\mathbb{E}\left[x_1 + x_2\right]\).
- Compute the expectation \(\mathbb{E}\left[\frac{1}{2} (x_1 + x_2)\right]\).
- Compute the expectation \(\mathbb{E}\left[\frac{1}{3} (x_1 + x_2 + x_3)\right]\).
- Compute the expectation \(\mathbb{E}\left[\frac{1}{N} \sum_{n=1}^Nx_n\right]\).
- Compute the variance \(\mathrm{Var}\left(x_1 + x_2\right)\).
- Compute the variance \(\mathrm{Var}\left(\frac{1}{2} (x_1 + x_2)\right)\).
- Compute the variance \(\mathrm{Var}\left(\frac{1}{3} (x_1 + x_2 + x_3)\right)\).
- Compute the variance \(\mathrm{Var}\left(\frac{1}{N} \sum_{n=1}^Nx_n\right)\).
- As \(N \rightarrow \infty\), what (if anything) does \(\frac{1}{N} \sum_{n=1}^Nx_n\) converge to?
- As \(N \rightarrow \infty\), what (if anything) does \(\frac{1}{N} \sum_{n=1}^N(x_n - 3)\) converge to?
- As \(N \rightarrow \infty\), what (if anything) does \(\frac{1}{\sqrt{N}} \sum_{n=1}^N(x_n - 3)\) converge to?
- As \(N \rightarrow \infty\), what (if anything) does \(\frac{1}{\sqrt{N}} \sum_{n=1}^Nx_n\) converge to?
- As \(N \rightarrow \infty\), what (if anything) does \(\sum_{n=1}^Nx_n\) converge to?
- Random variable
- Constant (\(\mathbb{E}\left[x_n\right]\) is a constant, not a random variable)
- \(3 + 3 = 6\)
- \(\frac{3 + 3}{2} = 3\)
- \(\frac{3 + 3 + 3}{3} = 3\)
- \(\frac{N 3}{N} = 3\)
- \(4 + 4 = 8\)
- \(\frac{4 + 4}{2^2} = 8/4 = 2\)
- \(\frac{4 + 4 + 4}{3^2} = 12 / 9 = 4/3\)
- \(\frac{1}{N^2} \mathrm{Var}\left(\sum_{n=1}^Nx_n^2\right) = \frac{1}{N^2} N \mathrm{Var}\left(x\right) = \frac{4 \cdot N}{N^2} = 4 / N\)
- 3
- 0
- \(\mathcal{N}\left(0,4\right)\)
- \(\frac{1}{\sqrt{N}} \sum_{n=1}^Nx_n = \frac{1}{\sqrt{N}} \sum_{n=1}^N(x_n - 3) + \frac{N}{\sqrt{N}} 3 = \textrm{CLT term} + 3 \sqrt{N}\rightarrow \infty\).
- \(\sum_{n=1}^Nx_n = N (\frac{1}{N} \sum_{n=1}^Nx_n - 3 + 3) = \textrm{LLN term going to zero} \cdot N + 3 \cdot N \rightarrow \infty\)
Note: In more advanced classes, instead of “CLT term” you would write \(O_p(1)\), which is a shorthand for a quantity whose precise value is unimportant, but which is bounded by some multiple of \(1\) with arbitrarily high probability as \(N \rightarrow \infty\). Similarly, for “LLN term going to zero” you would write \(o_p(1)\), which means something that goes to zero with high probabilty as \(N \rightarrow \infty\).
6 Conditional probability
Suppose we have scalar-valued, independent random variables \(a\) and \(b\) satisfying
\[ \begin{aligned} \mathbb{E}\left[a\right] ={}& 1 & \mathbb{E}\left[b\right] ={}& 2 \\ \mathrm{Var}\left(a\right) ={}& 9 & \mathrm{Var}\left(b\right) ={}& 25. \end{aligned} \]
Additionally, let \(c = 2 a + b\).
(a) Evaluate the following expressions. Each answer should be a number.
- \(\mathbb{E}\left[a + b\right]\)
- \(\mathbb{E}\left[2 a - b\right]\)
- \(\mathbb{E}\left[a b\right]\)
- \(\mathrm{Var}\left(2 a\right)\)
- \(\mathrm{Var}\left(a + b\right)\)
- \(\mathbb{E}\left[a^2\right]\)
- \(\mathbb{E}\left[b a^2\right]\)
- \(\mathbb{E}\left[c\right]\)
- \(\mathrm{Var}\left(c\right)\)
- \(\mathbb{E}\left[a + b\right] = \mathbb{E}\left[a\right] + \mathbb{E}\left[b\right] = 1 + 2 = 3\)
- \(\mathbb{E}\left[2 a - b\right] = 2\mathbb{E}\left[a\right] - \mathbb{E}\left[b\right] = 2(1) - 2 = 0\)
- \(\mathbb{E}\left[a b\right] = \mathbb{E}\left[a\right]\mathbb{E}\left[b\right] = 1 \cdot 2 = 2\) (by independence)
- \(\mathrm{Var}\left(2 a\right) = 4\mathrm{Var}\left(a\right) = 4 \cdot 9 = 36\)
- \(\mathrm{Var}\left(a + b\right) = \mathrm{Var}\left(a\right) + \mathrm{Var}\left(b\right) = 9 + 25 = 34\) (by independence)
- \(\mathbb{E}\left[a^2\right] = \mathrm{Var}\left(a\right) + (\mathbb{E}\left[a\right])^2 = 9 + 1 = 10\)
- \(\mathbb{E}\left[b a^2\right] = \mathbb{E}\left[b\right]\mathbb{E}\left[a^2\right] = 2 \cdot 10 = 20\) (by independence)
- \(\mathbb{E}\left[c\right] = \mathbb{E}\left[2a + b\right] = 2\mathbb{E}\left[a\right] + \mathbb{E}\left[b\right] = 2(1) + 2 = 4\)
- \(\mathrm{Var}\left(c\right) = \mathrm{Var}\left(2a + b\right) = 4\mathrm{Var}\left(a\right) + \mathrm{Var}\left(b\right) = 36 + 25 = 61\) (by independence)
(b) Evaluate the following expressions. For this problem, Now, the answers may be numbers or random variables.
- \(\mathbb{E}\left[a | b\right]\)
- \(\mathbb{E}\left[a b | b\right]\)
- \(\mathbb{E}\left[5 b | a\right]\)
- \(\mathbb{E}\left[c | a\right]\)
- \(\mathbb{E}\left[c | a,b\right]\)
- \(\mathrm{Var}\left(c | a\right)\)
- \(\mathrm{Var}\left(c | a,b\right)\)
- \(\mathbb{E}\left[a | b\right] = \mathbb{E}\left[a\right] = 1\) (by independence)
- \(\mathbb{E}\left[a b | b\right] = b\mathbb{E}\left[a | b\right] = b\mathbb{E}\left[a\right] = b \cdot 1 = b\) (random variable)
- \(\mathbb{E}\left[5 b | a\right] = \mathbb{E}\left[5 b\right] = 5 \cdot 2 = 10\) (since \(b\) is independent of \(a\))
- \(\mathbb{E}\left[c | a\right] = \mathbb{E}\left[2a + b | a\right] = 2a + \mathbb{E}\left[b|a\right] = 2a + 2\) (random variable)
- \(\mathbb{E}\left[c | a, b\right] = 2a + b\) (random variable)
- \(\mathrm{Var}\left(c | a\right) = \mathrm{Var}\left(2a + b | a\right) = \mathrm{Var}\left(b | a\right) = \mathrm{Var}\left(b\right) = 25\) (constant, and \(\mathrm{Var}\left(a | a\right) = 0\))
- \(\mathrm{Var}\left(c | a, b\right) = 0\) (constant, since \(c\) is deterministic given \(a\) and \(b\))
7 Problems from Freedman (2009)
Freedman (2009) Chapter 2 set B exercise 1
Suppose \(x_1, \ldots x_N\) are real numbers. Let \(\boldsymbol{X}= (x_1 + \ldots + x_N) / N\). Let \(c\) be a real number.
(a) Show that \(\sum_{n=1}^N(x_n - \bar{x}) = 0\).
Show that \(\sum_{n=1}^N(x_n - c)^2 = \sum_{n=1}^N(x_n - \bar{x})^2 + N (c - \bar{x})^2\). Hint: \(x_n - c = x_n - \bar{x}+ \bar{x}- c\).
Show that \(\sum_{n=1}^N(x_n - c)^2\), as a function of \(c\), has a unique minimum at \(c = \bar{x}\).
Show that \(\sum_{n=1}^Nx_n^2 = \sum_{n=1}^N(x_n - \bar{x})^2 + n \bar{x}^2\).
(a) \(\sum_{n=1}^N(x_n - \bar{x}) = \sum_{n=1}^Nx_n - N\bar{x}= N\bar{x}- N\bar{x}= 0\)
(b) Using the hint, \(x_n - c = (x_n - \bar{x}) + (\bar{x}- c)\), so \[ \begin{aligned} \sum_{n=1}^N(x_n - c)^2 &= \sum_{n=1}^N[(x_n - \bar{x}) + (\bar{x}- c)]^2 \\ &= \sum_{n=1}^N(x_n - \bar{x})^2 + 2(\bar{x}- c)\sum_{n=1}^N(x_n - \bar{x}) + N(\bar{x}- c)^2 \\ &= \sum_{n=1}^N(x_n - \bar{x})^2 + 0 + N(c - \bar{x})^2 \end{aligned} \] where we used part (a) to eliminate the middle term.
(c) From (b), \(\sum_{n=1}^N(x_n - c)^2 = \sum_{n=1}^N(x_n - \bar{x})^2 + N(c - \bar{x})^2\). Since \((c - \bar{x})^2 \ge 0\) with equality iff \(c = \bar{x}\), the expression is uniquely minimized at \(c = \bar{x}\).
(d) Set \(c = 0\) in part (b): \(\sum_{n=1}^Nx_n^2 = \sum_{n=1}^N(x_n - \bar{x})^2 + N\bar{x}^2\).
Freedman (2009) Chapter 2 set B exercise 14
Suppose that \(x_1, \ldots, x_N\) and \(y_1, \ldots, y_N\) have sample means \(\bar{x}\) and \(\bar{y}\) respectively; the sample standard deviations are \(s_x > 0\) and \(s_y > 0\), and the sample correlation is \(r\). Note that here \(\widehat{\mathrm{Var}}\left(x\right) = \frac{1}{N} \sum_{n=1}^N(x_n - \bar{x})^2\), with an analogous formula for \(\widehat{\mathrm{Var}}\left(y\right)\), and that \(s_x = \sqrt{\widehat{\mathrm{Var}}\left(x\right)}\) and \(s_y = \sqrt{\widehat{\mathrm{Var}}\left(y\right)}\).
Let \[ \widehat{\mathrm{Cov}}\left(x, y\right) = \frac{1}{N} \sum_{n=1}^N(x_n - \bar{x}) (y_n - \bar{y}). \] Show that:
(a) \(\widehat{\mathrm{Cov}}\left(x, y\right) = rs_x s_y\).
(b) The slope of the regression line for predicting \(y\) from \(x\) is \(\mathrm{Cov}\left((\right)x, y) / \widehat{\mathrm{Var}}\left(x\right)\).
(c) \(\widehat{\mathrm{Var}}\left(x\right) = \widehat{\mathrm{Cov}}\left(x,x\right)\).
(d) \(\widehat{\mathrm{Cov}}\left(x, y\right) = \overline{xy} - \bar{x}\,\bar{y}\)
(e) \(\widehat{\mathrm{Var}}\left(x\right) = \overline{x^2} - \bar{x}\,\bar{x}\).
(a) By definition, the sample correlation is \(r = \frac{\widehat{\mathrm{Cov}}\left(x,y\right)}{s_x s_y}\). Rearranging gives \(\widehat{\mathrm{Cov}}\left(x, y\right) = r s_x s_y\).
(b) Recall that the slope of the regression line is given by \[ \hat{\beta}= \frac{\frac{1}{N} \sum_{n=1}^N(y_n - \bar{y}) (x_n - \bar{x})}{\frac{1}{N} \sum_{n=1}^N(x_n - \bar{x})^2}. \] The answer follows by recognizing the form of the numerator and demoninator.
(c) By definition, \[ \widehat{\mathrm{Cov}}\left(x,x\right) = \frac{1}{N} \sum_{n=1}^N(x_n - \bar{x})(x_n - \bar{x}) = \frac{1}{N} \sum_{n=1}^N(x_n - \bar{x})^2 = \widehat{\mathrm{Var}}\left(x\right). \]
(d) Expanding the covariance: \[ \begin{aligned} \widehat{\mathrm{Cov}}\left(x,y\right) &= \frac{1}{N} \sum_{n=1}^N(x_n - \bar{x})(y_n - \bar{y}) \\ &= \frac{1}{N} \sum_{n=1}^Nx_n y_n - \bar{x}\frac{1}{N} \sum_{n=1}^Ny_n - \bar{y}\frac{1}{N} \sum_{n=1}^Nx_n + \bar{x}\bar{y}\\ &= \overline{xy} - \bar{x}\bar{y}- \bar{y}\bar{x}+ \bar{x}\bar{y}\\ &= \overline{xy} - \bar{x}\bar{y}. \end{aligned} \]
(e) This is a special case of (d) with \(y= x\): \[ \widehat{\mathrm{Var}}\left(x\right) = \widehat{\mathrm{Cov}}\left(x,x\right) = \overline{x^2} - \bar{x}\,\bar{x}. \]