Multiple Regression
Chapter 6.2 - 6.4

Today’s goals


  1. Multiple regression with one numerical and one categorical variables
  2. Understand intercept, slope, offset in intercept, and offset in slope
  3. Parallel slopes vs interaction model

Multiple Regression Goal and Example

Goal: build a model with one numerical and one categorical explanatory variable.

Using the penguins dataset, we want to determine how flipper_length_mm impacts body_mass_g. But we also think body_mass_g varies by sex.


Build a model that predicts body_mass_g using flipper_length_mm and sex as explanatory variables. In this dataset sex has 2 levels: {male, female}

Model 1

First, let’s assume that sex has no impact on the rate of change between flipper_length and body_mass.

model_parallel <- lm(body_mass_g ~ flipper_length_mm + sex, data=penguins)

summary(model_parallel)

Call:
lm(formula = body_mass_g ~ flipper_length_mm + sex, data = penguins)

Residuals:
    Min      1Q  Median      3Q     Max 
-910.28 -243.89   -2.94  238.85 1067.73 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -5410.300    285.798 -18.931  < 2e-16 ***
flipper_length_mm    46.982      1.441  32.598  < 2e-16 ***
sexmale             347.850     40.342   8.623 2.78e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 355.9 on 330 degrees of freedom
  (11 observations deleted due to missingness)
Multiple R-squared:  0.8058,    Adjusted R-squared:  0.8047 
F-statistic: 684.8 on 2 and 330 DF,  p-value: < 2.2e-16
glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

Same Slopes Model Output

Equation: \(\widehat{bodymass} = -5410.3 + 46.98*flipper + 347.85*\mathbb{1}_{male}(x)\)

  • What is the reference level (baseline) for sex?
  • Interpretation of \(b_0\) : (intercept for female penguins) The expected body mass for female penguins (the baseline) when flipper length is 0 is -5410.3
  • Interpretation of \(b_1\) : (slope of flipper length for both male and female penguins) For every 1mm increase in flipper length, we EXPECT body mass to increase by 46.98 g on average regardless of penguin gender.
  • Interpretation of \(b_2\) : (offset in intercept for male penguins) On average, we expect male penguins to weight 347.85 g more than female penguins.

Same Slopes Model Output

  • You can think of each ‘level’ having a different line (equation).
  • The equation of the line for penguins who are female:

\[-5410.3 + 46.98*flipper + 347.85*0\] \[ = -5410.3 + 46.98*flipper\]

  • The equation of the line for penguins who are male

\[-5410.3 + 46.98*flipper + 347.85*1\] \[ = -5062.45 + 46.98*flipper\]

Model 2

Now, let’s assume that the rate of change between flipper_length and body_mass varies based on sex.

model_interaction <- lm(body_mass_g ~ flipper_length_mm + sex +flipper_length_mm*sex, data=penguins)

summary(model_interaction)

Call:
lm(formula = body_mass_g ~ flipper_length_mm + sex + flipper_length_mm * 
    sex, data = penguins)

Residuals:
    Min      1Q  Median      3Q     Max 
-909.36 -246.58   -3.13  237.18 1065.19 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)               -5443.9607   440.2829 -12.365   <2e-16 ***
flipper_length_mm            47.1527     2.2264  21.179   <2e-16 ***
sexmale                     406.8015   587.3029   0.693    0.489    
flipper_length_mm:sexmale    -0.2942     2.9242  -0.101    0.920    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 356.4 on 329 degrees of freedom
  (11 observations deleted due to missingness)
Multiple R-squared:  0.8058,    Adjusted R-squared:  0.8041 
F-statistic: 455.2 on 3 and 329 DF,  p-value: < 2.2e-16
model_interaction <- lm(body_mass_g ~ flipper_length_mm*sex, data=penguins)

summary(model_interaction)

Call:
lm(formula = body_mass_g ~ flipper_length_mm * sex, data = penguins)

Residuals:
    Min      1Q  Median      3Q     Max 
-909.36 -246.58   -3.13  237.18 1065.19 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)               -5443.9607   440.2829 -12.365   <2e-16 ***
flipper_length_mm            47.1527     2.2264  21.179   <2e-16 ***
sexmale                     406.8015   587.3029   0.693    0.489    
flipper_length_mm:sexmale    -0.2942     2.9242  -0.101    0.920    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 356.4 on 329 degrees of freedom
  (11 observations deleted due to missingness)
Multiple R-squared:  0.8058,    Adjusted R-squared:  0.8041 
F-statistic: 455.2 on 3 and 329 DF,  p-value: < 2.2e-16

Interaction Model Output

Equation: \[\widehat{bodymass} = -5443.96 + 47.15*flipper + 406.8*\mathbb{1}_{male}(x) \] \[- 0.29*flipper*\mathbb{1}_{male}(x)\]

  • \(b_0\) :
  • \(b_1\) :
  • \(b_2\) :
  • \(b_3\) :

Interaction Model: Context of the Problem

Equation: \[\widehat{bodymass} = -5443.96 + 47.15*flipper + 406.8*\mathbb{1}_{male}(x) \] \[- 0.29*flipper*\mathbb{1}_{male}(x)\]

  • \(b_0\) : When flipper length is 0 we expect female penguins to weigh -5443.96 g
  • \(b_1\) : For every 1 mm increase in flipper length, we predict body mass increases by 47.15g for female penguins
  • \(b_2\) : When flipper length is 0, we expect male penguins to weigh 406.8 g more than female penguins.
  • \(b_3\) : For every 1mm increase in flipper length, we predict the body mass for male penguins will increase at a rate that is 0.29 less than the rate of increase for female penguins.

Interaction Model Output

You can think of each ‘level’ having a different line (equation).


Equation of the line for female penguins: \[\widehat{bodymass} = -5443.96 + 47.15*flipper + 406.8*0 - 0.29*flipper*0\] \[\widehat{bodymass} = -5443.96 + 47.15*flipper\]

Equation of the line for male penguins: \[\widehat{bodymass} = -5443.96 + 47.15*flipper + 406.8*1 - 0.29*flipper*1\] \[\widehat{bodymass} = -5037.16 + 46.86*flipper\]