$$ \newcommand{\mybold}[1]{\boldsymbol{#1}} \newcommand{\trans}{\intercal} \newcommand{\norm}[1]{\left\Vert#1\right\Vert} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\bbr}{\mathbb{R}} \newcommand{\bbz}{\mathbb{Z}} \newcommand{\bbc}{\mathbb{C}} \newcommand{\gauss}[1]{\mathcal{N}\left(#1\right)} \newcommand{\chisq}[1]{\mathcal{\chi}^2_{#1}} \newcommand{\studentt}[1]{\mathrm{StudentT}_{#1}} \newcommand{\fdist}[2]{\mathrm{FDist}_{#1,#2}} \newcommand{\iid}{\overset{\mathrm{IID}}{\sim}} \newcommand{\argmin}[1]{\underset{#1}{\mathrm{argmin}}\,} \newcommand{\projop}[1]{\underset{#1}{\mathrm{Proj}}\,} \newcommand{\proj}[1]{\underset{#1}{\mybold{P}}} \newcommand{\expect}[1]{\mathbb{E}\left[#1\right]} \newcommand{\prob}[1]{\mathbb{P}\left(#1\right)} \newcommand{\dens}[1]{\mathit{p}\left(#1\right)} \newcommand{\var}[1]{\mathrm{Var}\left(#1\right)} \newcommand{\cov}[1]{\mathrm{Cov}\left(#1\right)} \newcommand{\sumn}{\sum_{n=1}^N} \newcommand{\meann}{\frac{1}{N} \sumn} \newcommand{\cltn}{\frac{1}{\sqrt{N}} \sumn} \newcommand{\trace}[1]{\mathrm{trace}\left(#1\right)} \newcommand{\diag}[1]{\mathrm{Diag}\left(#1\right)} \newcommand{\grad}[2]{\nabla_{#1} \left. #2 \right.} \newcommand{\gradat}[3]{\nabla_{#1} \left. #2 \right|_{#3}} \newcommand{\fracat}[3]{\left. \frac{#1}{#2} \right|_{#3}} \newcommand{\W}{\mybold{W}} \newcommand{\w}{w} \newcommand{\wbar}{\bar{w}} \newcommand{\wv}{\mybold{w}} \newcommand{\X}{\mybold{X}} \newcommand{\x}{x} \newcommand{\xbar}{\bar{x}} \newcommand{\xv}{\mybold{x}} \newcommand{\Xcov}{\mybold{M}_{\X}} \newcommand{\Xcovhat}{\hat{\mybold{M}}_{\X}} \newcommand{\Covsand}{\Sigmam_{\mathrm{sand}}} \newcommand{\Covsandhat}{\hat{\Sigmam}_{\mathrm{sand}}} \newcommand{\Z}{\mybold{Z}} \newcommand{\z}{z} \newcommand{\zv}{\mybold{z}} \newcommand{\zbar}{\bar{z}} \newcommand{\Y}{\mybold{Y}} \newcommand{\Yhat}{\hat{\Y}} \newcommand{\y}{y} \newcommand{\yv}{\mybold{y}} \newcommand{\yhat}{\hat{\y}} \newcommand{\ybar}{\bar{y}} \newcommand{\res}{\varepsilon} \newcommand{\resv}{\mybold{\res}} \newcommand{\resvhat}{\hat{\mybold{\res}}} \newcommand{\reshat}{\hat{\res}} \newcommand{\betav}{\mybold{\beta}} \newcommand{\betavhat}{\hat{\betav}} \newcommand{\betahat}{\hat{\beta}} \newcommand{\betastar}{{\beta^{*}}} \newcommand{\betavstar}{{\betav^{*}}} \newcommand{\loss}{\mathscr{L}} \newcommand{\losshat}{\hat{\loss}} \newcommand{\f}{f} \newcommand{\fhat}{\hat{f}} \newcommand{\bv}{\mybold{\b}} \newcommand{\bvhat}{\hat{\bv}} \newcommand{\alphav}{\mybold{\alpha}} \newcommand{\alphavhat}{\hat{\av}} \newcommand{\alphahat}{\hat{\alpha}} \newcommand{\omegav}{\mybold{\omega}} \newcommand{\gv}{\mybold{\gamma}} \newcommand{\gvhat}{\hat{\gv}} \newcommand{\ghat}{\hat{\gamma}} \newcommand{\hv}{\mybold{\h}} \newcommand{\hvhat}{\hat{\hv}} \newcommand{\hhat}{\hat{\h}} \newcommand{\gammav}{\mybold{\gamma}} \newcommand{\gammavhat}{\hat{\gammav}} \newcommand{\gammahat}{\hat{\gamma}} \newcommand{\new}{\mathrm{new}} \newcommand{\zerov}{\mybold{0}} \newcommand{\onev}{\mybold{1}} \newcommand{\id}{\mybold{I}} \newcommand{\sigmahat}{\hat{\sigma}} \newcommand{\etav}{\mybold{\eta}} \newcommand{\muv}{\mybold{\mu}} \newcommand{\Sigmam}{\mybold{\Sigma}} \newcommand{\rdom}[1]{\mathbb{R}^{#1}} \newcommand{\RV}[1]{{#1}} \def\A{\mybold{A}} \def\A{\mybold{A}} \def\av{\mybold{a}} \def\a{a} \def\B{\mybold{B}} \def\b{b} \def\S{\mybold{S}} \def\sv{\mybold{s}} \def\s{s} \def\R{\mybold{R}} \def\rv{\mybold{r}} \def\r{r} \def\V{\mybold{V}} \def\vv{\mybold{v}} \def\v{v} \def\vhat{\hat{v}} \def\U{\mybold{U}} \def\uv{\mybold{u}} \def\u{u} \def\W{\mybold{W}} \def\wv{\mybold{w}} \def\w{w} \def\tv{\mybold{t}} \def\t{t} \def\Sc{\mathcal{S}} \def\ev{\mybold{e}} \def\Lammat{\mybold{\Lambda}} \def\Q{\mybold{Q}} \def\eps{\varepsilon} $$

Transforming regressors.

$\,$

Goals

Discuss transformations of regressors
- Define R2, and talk about its relationship to the regressors
- Polynomial regressors
- Interaction regressors

Define R squared

We have defined $\Yhat = \X \betavhat$, and shown that $\Yhat = \proj{X} \betavhat = \X (\X^\trans\X)^{-1}\X^\trans \Y$. We can call

\[ \begin{aligned} \Yhat^\trans \Yhat :={}& ESS = \textrm{"Explained sum of squares"} \\ \Y^\trans \Y :={}& TSS = \textrm{"Total sum of squares"} \\ \resvhat^\trans \resvhat :={}& RSS = \textrm{"Residual sum of squares"}. \end{aligned} \]

In your homework, you will show that $TSS = ESS + RSS$. It follows immediately that $0 \le ESS \le TSS$, and we can define \[ R^2 := \frac{ESS}{TSS} \in [0, 1]. \]

In a sense, high $R^2$ means a “good fit” in the sense that the least squares fit has low error.

High $R^2$ is not necessarily a good fit

But high $R^2$ does not necessarily mean that the fit is accurate or useful! In particular, by increasing the number of regressors, you can only make $R^2$ increase, and there are clearly silly regressions with great $R^2$.

Here are two examples:

$\Y \sim \Y$
$\Y \sim \id$

Today we will look at the effect of regressors on $R^2$, and how to make new regressors from the ones you have. We will also talk about how to interpret such “interaction” regressors.

Recall the Ames data

Let’s try to predict SalePrice using the regressors in the Ames dataset. We’ll explore some different ways to include more regressors.

ggplot(ames_df) +
  geom_point(aes(x=Gr.Liv.Area, y=SalePrice))

Let’s run on a bunch of regressors

regs <- c("Neighborhood", "Bldg.Type", "Overall.Qual", "Overall.Cond", 
          "Exter.Qual", "Exter.Cond", 
          "X1st.Flr.SF", "X2nd.Flr.SF", "Gr.Liv.Area", "Low.Qual.Fin.SF",
          "Kitchen.Qual", 
          "Garage.Area", "Garage.Qual", "Garage.Cond",
          "Paved.Drive", "Open.Porch.SF", "Enclosed.Porch",
          "Pool.Area", "Yr.Sold")

if (any(!(regs %in% names(ames_df)))) {
  print(setdiff(regs, names(ames_df)))
}

I’ll run a bunch of regressions, including more and more regressors, and see what happens.

How does the “fit” change?

How does the “fit” change?

You can’t use R2 alone to determine which regressors to include, since R2 will always tell you to add more
It helps to be smart about what regressors you include
How to make more regressors from the regressors we have?

Polynomial regressors

reg <- lm(SalePrice ~ Gr.Liv.Area, ames_df)

It looks like there may be non-linear dependence. How can we fit a nonlinear curve with “linear regression”?

Regress on the square too

reg_sq <- lm(SalePrice ~ Gr.Liv.Area + I(Gr.Liv.Area^2), ames_df)

It looks like there may be non-linear dependence. How can we fit a nonlinear curve with “linear regression”?

Regress on the higher-order polynomials

reg_order1 <- lm(SalePrice ~ Gr.Liv.Area, ames_df)
reg_order2 <- lm(SalePrice ~ Gr.Liv.Area + I(Gr.Liv.Area^2), ames_df)
reg_order3 <- lm(SalePrice ~ Gr.Liv.Area + I(Gr.Liv.Area^2) + I(Gr.Liv.Area^3), ames_df)
reg_order4 <- lm(SalePrice ~ Gr.Liv.Area + I(Gr.Liv.Area^2) + 
                 I(Gr.Liv.Area^3) + I(Gr.Liv.Area^4), ames_df)

What will happen if you regress on higher and higher-order polynomials? How do you decide when to stop?

Regress on the higher-order polynomials

The poly function does the same thing more compactly. It also uses “orthogonal polynomials,” which can help fitting.

reg_order4 <- lm(SalePrice ~ Gr.Liv.Area + I(Gr.Liv.Area^2) + 
                 I(Gr.Liv.Area^3) + I(Gr.Liv.Area^4), ames_df)

reg_order4_v2 <- lm(SalePrice ~ poly(Gr.Liv.Area, 4), ames_df)
print(max(abs(fitted(reg_order4) - fitted(reg_order4_v2))))

[1] 6.984919e-10

Interactions

Recall that we found some grouping by kitchen quality. Let’s include the indicator in the regression:

reg_additive <- lm(SalePrice ~ Gr.Liv.Area + Kitchen.Qual, ames_df)

Interactions

The fit is now different for each kitchen quality, and the overall slope changed too!

However, the slope is still the same within each kitchen quality group. Why?

XTX plot

We can see in the XTX matrix that there is some association between living area and kitchen quality.

How would we expect the slope to have changed if this matrix were diagonal?

x <- model.matrix(reg_additive) %>% scale(center=FALSE)
xtx <- t(x) %*% x / nrow(ames_df)
PlotMatrix(xtx)

Level dropping

Note that one of the levels is automatically dropped? Why?

table(ames_df$Kitchen.Qual)


  Ex   Fa   Gd   TA 
 108   42  844 1197

print(coefficients(reg_additive))

   (Intercept)    Gr.Liv.Area Kitchen.QualFa Kitchen.QualGd Kitchen.QualTA 
  162906.94088       82.67511  -144558.98868   -90449.59019  -128625.75269

Interactions

How would you fit a different slope for each kitchen quality? Use an interaction term. You can use the R notation : and it will construct the interaction automatically for you.

reg_inter <- lm(SalePrice ~ Gr.Liv.Area:Kitchen.Qual, ames_df)
bind_cols(select(ames_df, Gr.Liv.Area, Kitchen.Qual), 
          as.data.frame(model.matrix(reg_inter)))[1,] %>%
          t()

                           1     
Gr.Liv.Area                "1656"
Kitchen.Qual               "TA"  
(Intercept)                "1"   
Gr.Liv.Area:Kitchen.QualEx "0"   
Gr.Liv.Area:Kitchen.QualFa "0"   
Gr.Liv.Area:Kitchen.QualGd "0"   
Gr.Liv.Area:Kitchen.QualTA "1656"

Here’s a good cheat sheet for R formulas:

https://www.econometrics.blog/post/the-r-formula-cheatsheet/

Interaction XTX matrix

x <- model.matrix(reg_inter) %>% scale(center=FALSE)
xtx <- t(x) %*% x / nrow(ames_df)
PlotMatrix(xtx)

Interaction plot

Is there a point where all the slope interaction lines cross?

Both slopes and offsets

Using the R notation *, we can add both constant interactions and slope interactions:

reg_add_inter <- lm(SalePrice ~ Gr.Liv.Area*Kitchen.Qual, ames_df)
colnames(model.matrix(reg_add_inter))

[1] "(Intercept)"                "Gr.Liv.Area"               
[3] "Kitchen.QualFa"             "Kitchen.QualGd"            
[5] "Kitchen.QualTA"             "Gr.Liv.Area:Kitchen.QualFa"
[7] "Gr.Liv.Area:Kitchen.QualGd" "Gr.Liv.Area:Kitchen.QualTA"

Higher–order interactions

We can create still higher–order interactions with the symbol ^:

table(ames_df$Garage.Qual)


  Ex   Fa   Gd   Po   TA 
   3  101   19    3 2065

table(ames_df$Exter.Qual)


  Ex   Fa   Gd   TA 
  52   13  679 1447

table(ames_df$Kitchen.Qual)


  Ex   Fa   Gd   TA 
 108   42  844 1197

reg_higher <- lm(
  SalePrice ~ (Exter.Qual + Kitchen.Qual + Garage.Qual)^3,
  ames_df)

Three–way interactions have a lot of NA estimates.

coeff_names <- names(coefficients(reg_higher))
cbind(coeff_names[1:5], coeff_names[21:25], coeff_names[71:75])

     [,1]             [,2]                        
[1,] "(Intercept)"    "Exter.QualFa:Garage.QualFa"
[2,] "Exter.QualFa"   "Exter.QualGd:Garage.QualFa"
[3,] "Exter.QualGd"   "Exter.QualTA:Garage.QualFa"
[4,] "Exter.QualTA"   "Exter.QualFa:Garage.QualGd"
[5,] "Kitchen.QualFa" "Exter.QualGd:Garage.QualGd"
     [,3]                                       
[1,] "Exter.QualTA:Kitchen.QualTA:Garage.QualPo"
[2,] "Exter.QualFa:Kitchen.QualFa:Garage.QualTA"
[3,] "Exter.QualGd:Kitchen.QualFa:Garage.QualTA"
[4,] "Exter.QualTA:Kitchen.QualFa:Garage.QualTA"
[5,] "Exter.QualFa:Kitchen.QualGd:Garage.QualTA"

print(sum(is.na(coefficients(reg_higher))))

[1] 48

Why are there so many NA coefficients?

Higher–order Interaction Plot

Why do the higher–order interactions look like step functions?

--- title: "Transforming regressors." format: html: code-fold: false code-tools: true include-before-body: - file: ../macros.md --- $\,$   ```{r} #| echo: false #| output: false # Raster is not included in the Quarto build environment :( #library(raster) # Load before tidyverse to avoid conflict with select library(tidyverse) library(gridExtra) ames_df <- read.csv("../datasets/ames_house/data/ames_de_cock.csv") %>% filter(Sale.Condition=="Normal", MS.Zoning %in% c("RH", "RL", "RP", "RM")) %>% filter(Kitchen.Qual != "Po") %>% # only one observation! filter(!is.na(Garage.Qual)) ``` ```{r} #| echo: false #| output: false if (FALSE) { RasterizeMatrix <- function(mat) { mat_df <- as.data.frame(raster(mat), xy = TRUE) %>% mutate(col_ind=round(x * ncol(mat) + 0.5), row_ind=round((1 - y) * nrow(mat) + 0.5)) %>% mutate(col_lab=colnames(mat)[col_ind]) %>% mutate(row_lab=rownames(mat)[row_ind]) return(mat_df) } PlotMatrix <- function(mat) { RasterizeMatrix(mat) %>% ggplot() + geom_raster(aes(x=col_lab, y=row_lab, fill=layer)) } } if (FALSE) { # Test mat <- matrix(runif(6), nrow=3) rownames(mat) <- letters[1:3] colnames(mat) <- c("A", "B") mat_df <- RasterizeMatrix(mat) for (col_lab in colnames(mat)) { for (row_lab in rownames(mat)) { mat_val <- mat_df %>% filter(row_lab == !!row_lab, col_lab == !!col_lab) %>% pull(layer) stopifnot(abs(mat_val - mat[row_lab, col_lab]) < 1e-8) } } } PlotMatrix <- function(mat) { image(mat) } ``` # Goals - Discuss transformations of regressors - Define R2, and talk about its relationship to the regressors - Polynomial regressors - Interaction regressors # Define R squared We have defined $\Yhat = \X \betavhat$, and shown that $\Yhat = \proj{X} \betavhat = \X (\X^\trans\X)^{-1}\X^\trans \Y$. We can call $$ \begin{aligned} \Yhat^\trans \Yhat :={}& ESS = \textrm{"Explained sum of squares"} \\ \Y^\trans \Y :={}& TSS = \textrm{"Total sum of squares"} \\ \resvhat^\trans \resvhat :={}& RSS = \textrm{"Residual sum of squares"}. \end{aligned} $$ In your homework, you will show that $TSS = ESS + RSS$. It follows immediately that $0 \le ESS \le TSS$, and we can define $$ R^2 := \frac{ESS}{TSS} \in [0, 1]. $$ In a sense, high $R^2$ means a "good fit" in the sense that the least squares fit has low error. # High $R^2$ is not necessarily a good fit But high $R^2$ does not necessarily mean that the fit is accurate or useful! In particular, by increasing the number of regressors, you can only make $R^2$ increase, and there are clearly silly regressions with great $R^2$. Here are two examples: - $\Y \sim \Y$ - $\Y \sim \id$ Today we will look at the effect of regressors on $R^2$, and how to make new regressors from the ones you have. We will also talk about how to interpret such "interaction" regressors. # Recall the Ames data Let's try to predict `SalePrice` using the regressors in the Ames dataset. We'll explore some different ways to include more regressors. ```{r} ggplot(ames_df) + geom_point(aes(x=Gr.Liv.Area, y=SalePrice)) ``` # Let's run on a bunch of regressors ```{r} regs <- c("Neighborhood", "Bldg.Type", "Overall.Qual", "Overall.Cond", "Exter.Qual", "Exter.Cond", "X1st.Flr.SF", "X2nd.Flr.SF", "Gr.Liv.Area", "Low.Qual.Fin.SF", "Kitchen.Qual", "Garage.Area", "Garage.Qual", "Garage.Cond", "Paved.Drive", "Open.Porch.SF", "Enclosed.Porch", "Pool.Area", "Yr.Sold") if (any(!(regs %in% names(ames_df)))) { print(setdiff(regs, names(ames_df))) } ``` I'll run a bunch of regressions, including more and more regressors, and see what happens. ```{r} #| echo: false #| output: false fits_df <- data.frame() r2_df <- data.frame() for (i in 1:length(regs)) { reg_form <- paste(regs[1:i], collapse=" + ") lm_fit <- lm(formula(paste0("SalePrice ~ ", reg_form)), ames_df) fits_df <- bind_rows(fits_df, data.frame(x=ames_df$Gr.Liv.Area, y=ames_df$SalePrice, yhat=fitted(lm_fit), num_regs=i)) r2_df <- bind_rows(r2_df, data.frame( r2=summary(lm_fit)$r.squared, num_regs=i )) } ``` # How does the "fit" change? ```{r} #| echo: false ggplot(r2_df) + geom_line(aes(x=num_regs, y=r2)) + ylim(0, 1) + xlab("Number of regressors included") + ylab("R2") ``` # How does the "fit" change? ```{r} #| echo: false fits_df %>% filter(num_regs %% 4 == 0) %>% ggplot() + geom_point(aes(x=y, y=yhat)) + geom_abline(aes(slope=1, intercept=0)) + facet_grid(~ num_regs) ``` - You can't use R2 alone to determine which regressors to include, since R2 will always tell you to add more - It helps to be smart about what regressors you include - How to make more regressors from the regressors we have? # Polynomial regressors ```{r} reg <- lm(SalePrice ~ Gr.Liv.Area, ames_df) ``` ```{r} #| echo: false ggplot(ames_df, aes(y=SalePrice, x=Gr.Liv.Area), ames_df) + geom_point(alpha=0.5) + geom_line(aes(y=predict(reg, ames_df), color="Linear fit"), lwd=3) ``` It looks like there may be non-linear dependence. How can we fit a nonlinear curve with "linear regression"? # Regress on the square too ```{r} reg_sq <- lm(SalePrice ~ Gr.Liv.Area + I(Gr.Liv.Area^2), ames_df) ``` ```{r} #| echo: false ggplot(ames_df, aes(y=SalePrice, x=Gr.Liv.Area), ames_df) + geom_point(alpha=0.5) + geom_line(aes(y=predict(reg, ames_df), color="Linear fit"), lwd=3) + geom_line(aes(y=predict(reg_sq, ames_df), color="Quadratic fit"), lwd=3) ``` It looks like there may be non-linear dependence. How can we fit a nonlinear curve with "linear regression"? # Regress on the higher-order polynomials ```{r} reg_order1 <- lm(SalePrice ~ Gr.Liv.Area, ames_df) reg_order2 <- lm(SalePrice ~ Gr.Liv.Area + I(Gr.Liv.Area^2), ames_df) reg_order3 <- lm(SalePrice ~ Gr.Liv.Area + I(Gr.Liv.Area^2) + I(Gr.Liv.Area^3), ames_df) reg_order4 <- lm(SalePrice ~ Gr.Liv.Area + I(Gr.Liv.Area^2) + I(Gr.Liv.Area^3) + I(Gr.Liv.Area^4), ames_df) ``` ```{r} #| echo: false ggplot(ames_df, aes(y=SalePrice, x=Gr.Liv.Area)) + geom_point(alpha=0.3) + geom_line(aes(y=predict(reg_order1, ames_df), color="Order 1"), lwd=3) + geom_line(aes(y=predict(reg_order2, ames_df), color="Order 2"), lwd=3) + geom_line(aes(y=predict(reg_order3, ames_df), color="Order 3"), lwd=3) + geom_line(aes(y=predict(reg_order4, ames_df), color="Order 4"), lwd=3) ``` What will happen if you regress on higher and higher-order polynomials? How do you decide when to stop? # Regress on the higher-order polynomials The poly function does the same thing more compactly. It also uses "orthogonal polynomials," which can help fitting. ```{r} reg_order4 <- lm(SalePrice ~ Gr.Liv.Area + I(Gr.Liv.Area^2) + I(Gr.Liv.Area^3) + I(Gr.Liv.Area^4), ames_df) reg_order4_v2 <- lm(SalePrice ~ poly(Gr.Liv.Area, 4), ames_df) print(max(abs(fitted(reg_order4) - fitted(reg_order4_v2)))) ``` # Interactions ```{r} #| echo: false # But there may be some grouping with Kitchen.Qual ggplot(ames_df, aes(y=SalePrice, x=Gr.Liv.Area)) + geom_point(alpha=0.5, aes(color=Kitchen.Qual)) + geom_line(aes(y=predict(reg, ames_df)), lwd=2) ``` Recall that we found some grouping by kitchen quality. Let's include the indicator in the regression: ```{r} reg_additive <- lm(SalePrice ~ Gr.Liv.Area + Kitchen.Qual, ames_df) ``` # Interactions ```{r} #| echo: false additive_plot <- ggplot(ames_df, aes(y=SalePrice, x=Gr.Liv.Area)) + geom_point(alpha=0.3) + geom_line(aes(y=predict(reg, ames_df), color="Original fit"), lwd=3) + geom_line(aes(y=predict(reg_additive, ames_df), group=Kitchen.Qual, color=Kitchen.Qual), lwd=2) print(additive_plot) ``` The fit is now different for each kitchen quality, and the overall slope changed too! However, the slope is still the same within each kitchen quality group. **Why?** # XTX plot We can see in the XTX matrix that there is some association between living area and kitchen quality. **How would we expect the slope to have changed if this matrix were diagonal?** ```{r} x <- model.matrix(reg_additive) %>% scale(center=FALSE) xtx <- t(x) %*% x / nrow(ames_df) PlotMatrix(xtx) ``` # Level dropping Note that one of the levels is automatically dropped? **Why?** ```{r} table(ames_df$Kitchen.Qual) print(coefficients(reg_additive)) ``` # Interactions How would you fit a different slope for each kitchen quality? Use an interaction term. You can use the `R` notation `:` and it will construct the interaction automatically for you. ```{r} reg_inter <- lm(SalePrice ~ Gr.Liv.Area:Kitchen.Qual, ames_df) bind_cols(select(ames_df, Gr.Liv.Area, Kitchen.Qual), as.data.frame(model.matrix(reg_inter)))[1,] %>% t() ``` Here's a good cheat sheet for `R` formulas: `https://www.econometrics.blog/post/the-r-formula-cheatsheet/` # Interaction XTX matrix ```{r} x <- model.matrix(reg_inter) %>% scale(center=FALSE) xtx <- t(x) %*% x / nrow(ames_df) PlotMatrix(xtx) ``` # Interaction plot ```{r} #| echo: false interaction_plot <- ggplot(ames_df, aes(y=SalePrice, x=Gr.Liv.Area)) + geom_point(alpha=0.3) + geom_line(aes(y=predict(reg, ames_df), color="Overall fit"), lwd=3) + geom_line(aes(y=predict(reg_inter, ames_df), group=Kitchen.Qual, color=Kitchen.Qual), lwd=2) grid.arrange(interaction_plot + ggtitle("Interaction slopes") + theme(legend.position="none"), additive_plot + ggtitle("Interaction offsets") + theme(legend.position="none"), ncol=2) ``` **Is there a point where all the slope interaction lines cross?** # Both slopes and offsets Using the `R` notation `*`, we can add both constant interactions and slope interactions: ```{r} reg_add_inter <- lm(SalePrice ~ Gr.Liv.Area*Kitchen.Qual, ames_df) colnames(model.matrix(reg_add_inter)) ``` ```{r} #| echo: false additive_and_interaction_plot <- ggplot(ames_df, aes(y=SalePrice, x=Gr.Liv.Area)) + geom_point(alpha=0.3) + geom_line(aes(y=predict(reg, ames_df), color="Overall fit"), lwd=3) + geom_line(aes(y=predict(reg_add_inter, ames_df), group=Kitchen.Qual, color=Kitchen.Qual), lwd=2) grid.arrange(interaction_plot + ggtitle("Interaction slopes") + theme(legend.position="none"), additive_and_interaction_plot + ggtitle("Interaction slopes and offsets") + theme(legend.position="none"), ncol=2) ``` # Higher--order interactions We can create still higher--order interactions with the symbol `^`: ```{r} table(ames_df$Garage.Qual) table(ames_df$Exter.Qual) table(ames_df$Kitchen.Qual) ``` ```{r} reg_higher <- lm( SalePrice ~ (Exter.Qual + Kitchen.Qual + Garage.Qual)^3, ames_df) ``` # Three--way interactions have a lot of NA estimates. ```{r} coeff_names <- names(coefficients(reg_higher)) cbind(coeff_names[1:5], coeff_names[21:25], coeff_names[71:75]) ``` ```{r} print(sum(is.na(coefficients(reg_higher)))) ``` **Why are there so many NA coefficients?** # Higher--order Interaction Plot **Why do the higher--order interactions look like step functions?** ```{r} #| echo: false ggplot(ames_df, aes(y=SalePrice, x=Gr.Liv.Area)) + geom_point(alpha=0.3) + geom_line(aes(y=predict(reg, ames_df), color="Linear fit"), lwd=2) + geom_point(aes(y=predict(reg_higher, ames_df), color="Higher order indicator interactions")) ```

Goals

Define R squared

High \(R^2\) is not necessarily a good fit

Recall the Ames data

Let’s run on a bunch of regressors

How does the “fit” change?

How does the “fit” change?

Polynomial regressors

Regress on the square too

Regress on the higher-order polynomials

Regress on the higher-order polynomials

Interactions

Interactions

XTX plot

Level dropping

Interactions

Interaction XTX matrix

Interaction plot

Both slopes and offsets

Higher–order interactions

Three–way interactions have a lot of NA estimates.

Higher–order Interaction Plot