$$ \newcommand{\mybold}[1]{\boldsymbol{#1}} \newcommand{\trans}{\intercal} \newcommand{\norm}[1]{\left\Vert#1\right\Vert} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\bbr}{\mathbb{R}} \newcommand{\bbz}{\mathbb{Z}} \newcommand{\bbc}{\mathbb{C}} \newcommand{\gauss}[1]{\mathcal{N}\left(#1\right)} \newcommand{\chisq}[1]{\mathcal{\chi}^2_{#1}} \newcommand{\studentt}[1]{\mathrm{StudentT}_{#1}} \newcommand{\fdist}[2]{\mathrm{FDist}_{#1,#2}} \newcommand{\iid}{\overset{\mathrm{IID}}{\sim}} \newcommand{\argmin}[1]{\underset{#1}{\mathrm{argmin}}\,} \newcommand{\projop}[1]{\underset{#1}{\mathrm{Proj}}\,} \newcommand{\proj}[1]{\underset{#1}{\mybold{P}}} \newcommand{\expect}[1]{\mathbb{E}\left[#1\right]} \newcommand{\prob}[1]{\mathbb{P}\left(#1\right)} \newcommand{\dens}[1]{\mathit{p}\left(#1\right)} \newcommand{\var}[1]{\mathrm{Var}\left(#1\right)} \newcommand{\cov}[1]{\mathrm{Cov}\left(#1\right)} \newcommand{\sumn}{\sum_{n=1}^N} \newcommand{\meann}{\frac{1}{N} \sumn} \newcommand{\cltn}{\frac{1}{\sqrt{N}} \sumn} \newcommand{\trace}[1]{\mathrm{trace}\left(#1\right)} \newcommand{\diag}[1]{\mathrm{Diag}\left(#1\right)} \newcommand{\grad}[2]{\nabla_{#1} \left. #2 \right.} \newcommand{\gradat}[3]{\nabla_{#1} \left. #2 \right|_{#3}} \newcommand{\fracat}[3]{\left. \frac{#1}{#2} \right|_{#3}} \newcommand{\W}{\mybold{W}} \newcommand{\w}{w} \newcommand{\wbar}{\bar{w}} \newcommand{\wv}{\mybold{w}} \newcommand{\X}{\mybold{X}} \newcommand{\x}{x} \newcommand{\xbar}{\bar{x}} \newcommand{\xv}{\mybold{x}} \newcommand{\Xcov}{\mybold{M}_{\X}} \newcommand{\Xcovhat}{\hat{\mybold{M}}_{\X}} \newcommand{\Covsand}{\Sigmam_{\mathrm{sand}}} \newcommand{\Covsandhat}{\hat{\Sigmam}_{\mathrm{sand}}} \newcommand{\Z}{\mybold{Z}} \newcommand{\z}{z} \newcommand{\zv}{\mybold{z}} \newcommand{\zbar}{\bar{z}} \newcommand{\Y}{\mybold{Y}} \newcommand{\Yhat}{\hat{\Y}} \newcommand{\y}{y} \newcommand{\yv}{\mybold{y}} \newcommand{\yhat}{\hat{\y}} \newcommand{\ybar}{\bar{y}} \newcommand{\res}{\varepsilon} \newcommand{\resv}{\mybold{\res}} \newcommand{\resvhat}{\hat{\mybold{\res}}} \newcommand{\reshat}{\hat{\res}} \newcommand{\betav}{\mybold{\beta}} \newcommand{\betavhat}{\hat{\betav}} \newcommand{\betahat}{\hat{\beta}} \newcommand{\betastar}{{\beta^{*}}} \newcommand{\betavstar}{{\betav^{*}}} \newcommand{\loss}{\mathscr{L}} \newcommand{\losshat}{\hat{\loss}} \newcommand{\f}{f} \newcommand{\fhat}{\hat{f}} \newcommand{\bv}{\mybold{\b}} \newcommand{\bvhat}{\hat{\bv}} \newcommand{\alphav}{\mybold{\alpha}} \newcommand{\alphavhat}{\hat{\av}} \newcommand{\alphahat}{\hat{\alpha}} \newcommand{\omegav}{\mybold{\omega}} \newcommand{\gv}{\mybold{\gamma}} \newcommand{\gvhat}{\hat{\gv}} \newcommand{\ghat}{\hat{\gamma}} \newcommand{\hv}{\mybold{\h}} \newcommand{\hvhat}{\hat{\hv}} \newcommand{\hhat}{\hat{\h}} \newcommand{\gammav}{\mybold{\gamma}} \newcommand{\gammavhat}{\hat{\gammav}} \newcommand{\gammahat}{\hat{\gamma}} \newcommand{\new}{\mathrm{new}} \newcommand{\zerov}{\mybold{0}} \newcommand{\onev}{\mybold{1}} \newcommand{\id}{\mybold{I}} \newcommand{\sigmahat}{\hat{\sigma}} \newcommand{\etav}{\mybold{\eta}} \newcommand{\muv}{\mybold{\mu}} \newcommand{\Sigmam}{\mybold{\Sigma}} \newcommand{\rdom}[1]{\mathbb{R}^{#1}} \newcommand{\RV}[1]{{#1}} \def\A{\mybold{A}} \def\A{\mybold{A}} \def\av{\mybold{a}} \def\a{a} \def\B{\mybold{B}} \def\b{b} \def\S{\mybold{S}} \def\sv{\mybold{s}} \def\s{s} \def\R{\mybold{R}} \def\rv{\mybold{r}} \def\r{r} \def\V{\mybold{V}} \def\vv{\mybold{v}} \def\v{v} \def\vhat{\hat{v}} \def\U{\mybold{U}} \def\uv{\mybold{u}} \def\u{u} \def\W{\mybold{W}} \def\wv{\mybold{w}} \def\w{w} \def\tv{\mybold{t}} \def\t{t} \def\Sc{\mathcal{S}} \def\ev{\mybold{e}} \def\Lammat{\mybold{\Lambda}} \def\Q{\mybold{Q}} \def\eps{\varepsilon} $$

STAT151A Quiz 5

Please write your full name and email address here:

\[ \\[2in] \]

Also, please put your intials on each page in case the pages get separated.

\[ \\[1in] \]

You have 30 minutes for this quiz.

There are two questions, each weighted equally..

There are extra pages at the end if you need more space for solutions.

Question 1

Suppose we have a randomized controlled trial in which 100 patients are randomly chosen to either receive medication or not. After a period of time their self-reported well-being is recorded, as well as some additional covariates.

For this question, assume that we have an R dataframe df with the following columns:

df$wellbeing: Self-reported feeling of well-being after taking the medication
df$medicine: A binary (one–hot) encoding of whether medication was administered
df$age: The patient’s age in years
df$health: A continuous variable summarizing the patient’s pre–treatment health status

Suppose we run the regression

reg <- lm(wellbeing ~ medicine + age + health, df)
ci_medicine <- confint(reg, "medicineTRUE", level=0.95)
betahat_medicine <- coefficients(reg)["medicineTRUE"]

Each question peforms a calculation in R and draws an invalid conclusion. For each question, state in a single sentence why the conculsion is invalid using concepts from class. You do not need to mathematically prove your statement; you may refer to proofs from lecture.

Note that each question is separate, so a stated observation in one question does not necessarily apply in the other question.

Here is an example of the expected level of detail in the response:

(Example): Suppose we find that betahat_medicine is equal to -0.1873705.

Invalid conclusion: We conclude that the medicine improves well–being by -0.1873705. Why is the conclusion invalid?

Answer: Even if the regression is correctly specified, the regression estimate has some uncertainty, and is not necessarily equal to the the true expected effect of medicine.

(a) Suppose we find that ci_medicine takes the value c(-0.618182 0.243441).

Invalid conclusion: Because ci_medicine contains zero, we conclude that the medicine has no effect. Why is the conclusion invalid?

\[ \\[1in] \]

(b) Suppose we plot the fitted residuals and they are highly right–skewed.

Invalid conclusion: Because the residuals are non–normal, ci_medicine cannot be used for valid hypothesis testing. Why is the conclusion invalid? \[ \\[1in] \]

(c) Suppose we plot the fitted residuals and find that they are approximately normally distributed. However, their variance is much higher for low valuse of health than for high values.

Invalid conclusion: Because the residuals are approximately normal, ci_medicine can be used for valid hypothesis testing. Why is the conclusion invalid? \[ \\[1in] \]

Question 2

Let $s \sim \chisq{K}$ and $\z \sim \gauss{0, 1}$, independently of one another, so that $t = \frac{z}{\sqrt{s / K}} \sim \studentt{K}$.

(a) Use the LLN to argue that, for large $K$, it is approximately true that $t \sim \gauss{0, 1}$.

\[ \\[1.5in] \]

(b) Suppose that $\x \sim \gauss{0, 4}$ independently of $s$. Show that

\[ \frac{\x / 2}{\sqrt{\s / K}} \sim \studentt{K}. \]

\[ \\[1.5in] \]

(c) Suppose that $r_k \iid \gauss{0, 9}$, independently of $\z$, for $k = 1,\ldots,K$. Show that

\[ \frac{\z}{\sqrt{\sum_{k=1}^K r_k^2 / (9 K)}} \sim \studentt{K}. \]

\[ \\[1.5in] \]

Extra space for answers (indicate clearly which problem you are working on)