This coding assignment will use your work from homework 3 as a starting point. For the assignment, we’ll assume that
- for all , including new datapoints
- The regressors are also random with covariance matrix .
Variability in the training set
Fix , , and set to some values you choose. Set to have correlation off the diagonal and on the diagonal. Set .
Take to be a single fixed draw from the distribution of regressors, and draw a large number (> 5000) of , giving a large number of draws from . The should be normally distributed with mean and variance .
(a)
Draw a single training set , , and use it to construct an 80% interval for . Find the proportion of that lie in the interval.
(b)
Repeat (a), but with 10 different training sets. You can keep the the same. For each different training set, plot the corresponding intervals. Are they different from one another?
By a lot or a little?
(c)
Repeat (b), but with . You can keep the the same. How do the results compare to (b)? Why?
(d)
Repeat (b), but with very small: specifically, set . You will need to draw new . How do the results compare to (b)?
(e)
Repeat (b), but now take to be the smallest eigenvector of . (You can find the smallest eigenvector of using the R
function eigen
.)
You will need to draw new . How do the results compare to (b)?