STAT151A Code homework 1: Due January 26th
For all questions below, provide answers in complete sentences, and include correct and readable code to support your answers.
1 Spotify dataset
(a)
Find another variable (other than danceability) that is associated with popularity according to simple linear regression.
(b)
How does this association change if you remove low-popularity tracks? You may define low-popularity tracks however you like, but briefly defend your choice.
(c)
Identify a song that defies the relationship you found. (For example, having found a positive relationship between danceability and popularity, I might find a song that is highly popular but not ``danceable.’’)
Listen to the song on Spotify and comment on whether the result makes sense.
2 Bodyfat dataset
(a)
Choose two variables (other than bodyfat). Use lm
to regress bodyfat on these two variables and an intercept.
(b)
For the regression in the previous example, construct your own \(\boldsymbol{X}\) and \(\boldsymbol{Y}\) matrices by hand (don’t use the output of lm
). Using these, compute your own estimate \(\hat{\beta}\) and confirm that it matches the output of lm
.
(c)
Write a function in R
that computes \(\hat{\beta}\) from \(\boldsymbol{X}\) and \(\boldsymbol{Y}\). Document the function’s inputs and outputs. As an example, you might follow the Function Documentation
section of the Amazon R style guide.
3 Aluminum dataset
(a)
Run the regression from Lecture 1 using all three specimens, both with and without an intercept term. Plot the results.
For convenience, here is a filter function to limit to the right set of data:
filter(Strain < 0.0035, Strain > 0.0001,
== "T", temp == 20, lot == "A") loading_type
Comment on whether an intercept should be included and why. When the intercept is estimated, how can it be interpreted?