2026-02-09
R) with one continuous and one categorical predictor.R) with two continuous predictors.SLR helped us establish the foundation for a lot of regression
What did we learn in SLR??
Model Fitting
lm() function in RModel Use
Model Evaluation/Diagnostics


Model Selection
Building a model
Selecting variables
Prediction vs interpretation
Comparing potential models
Model Fitting
Find best fit line
Using OLS in this class
Parameter estimation
Categorical covariates
Interactions
Model Evaluation
Model Use (Inference)
In SLR, we only had one predictor and one outcome in the model:
Outcome: Life expectancy = the average number of years a newborn child would live if current mortality patterns were to stay the same.
Predictor: Cell phones per 100 people, the number of cell phones per 100 people in a country
Let’s say many other variables were measured for each country (see codebook)
R) with one continuous and one categorical predictor.R) with two continuous predictors.Simple Linear Regression
We use one predictor to try to explain the variance of the outcome
\[ Y = \beta_0 + \beta_1 X + \epsilon \]
Multiple Linear Regression
We use multiple predictors to try to explain the variance of the outcome
\[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_{k}X_{k}+ \epsilon \]
\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2+ \ldots + \beta_k X_k + \epsilon\]
or on the individual (observation) level:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}+ \ldots + \beta_k x_{ik} + \epsilon_i,\ \ \text{for}\ i = 1, 2, \ldots, n\]
\(Y\) is our dependent variable
\(X_1, X_2, \ldots, X_k\) are our \(k\) independent variables
R) with one continuous and one categorical predictor.R) with two continuous predictors.Simple linear regression population model
\[\begin{aligned} \text{LE} & = \beta_0 + \beta_1 \text{CP} + \epsilon \end{aligned}\]
Multiple linear regression population model (with added income level)
\[\begin{aligned} \text{LE} = & \beta_0 + \beta_1 \text{CP} + \beta_2 I(IL = \text{``Lower middle"}) + \\ & \beta_3 I(IL = \text{``Upper middle"}) + \beta_2 I(IL = \text{``High"}) + \epsilon \end{aligned}\]
New population model for example:
\[\text{LE} = \beta_0 + \beta_1 \text{CP} + \beta_2 I(IL = \text{``Lower middle"}) + \beta_3 I(IL = \text{``Upper middle"}) + \beta_2 I(IL = \text{``High"}) + \epsilon \]
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 62.254 | 1.792 | 34.735 | 0.000 | 58.698 | 65.810 |
| cell_phones_100 | 0.023 | 0.019 | 1.239 | 0.218 | −0.014 | 0.060 |
| income_level_4Lower middle income | 3.518 | 1.720 | 2.046 | 0.043 | 0.106 | 6.930 |
| income_level_4Upper middle income | 7.293 | 1.946 | 3.748 | 0.000 | 3.432 | 11.153 |
| income_level_4High income | 13.340 | 2.176 | 6.131 | 0.000 | 9.024 | 17.656 |
Fitted multiple regression model:
\[\begin{aligned} \widehat{\text{LE}} = & \widehat{\beta}_0 + \widehat{\beta}_1 \text{CP} + \widehat{\beta}_2 I(IL = \text{``Lower middle"}) + \widehat{\beta}_3 I(IL = \text{``Upper middle"}) + \widehat{\beta}_2 I(IL = \text{``High"}) \\ \widehat{\text{LE}} = & 62.25 + 0.023 \ \text{CP} + 3.52\ I(IL = \text{``Lower middle"}) + 7.29\ I(IL = \text{``Upper middle"}) + \\ & 13.34\ I(IL = \text{``High"}) \end{aligned}\]
R) with one continuous and one categorical predictor.R) with two continuous predictors.General interpretation for \(\widehat{\beta}_0\)
The expected \(Y\)-variable is (\(\widehat\beta_0\) units) when the \(X_1\)-variable is 0 \(X_1\)-units and \(X_2\)-variable is reference group (cat 1) (95% CI: LB, UB).
General interpretation for \(\widehat{\beta}_1\)
For every increase of 1 \(X_1\)-unit in the \(X_1\)-variable, adjusting/controlling for \(X_2\)-variable, there is an expected increase/decrease of \(|\widehat\beta_1|\) units in the \(Y\)-variable (95%: LB, UB).
General interpretation for \(\widehat{\beta}_2\)
Adjusting/controlling for \(X_1\)-variable, the difference in mean \(Y\)-variable comparing \(X_2\)-variable in category 2 to the reference group (cat 1) is \(|\widehat\beta_2|\) units (95%: LB, UB).
\[\widehat{Y} = \widehat{\beta}_0 + \widehat{\beta}_1 X_1 + \widehat{\beta}_2 I(X_2=\text{``cat 2"}) + \widehat{\beta}_3 I(X_2=\text{``cat 3"})\]
Interpretation: The expected \(Y\)-variable is (\(\widehat\beta_0\) units) when the \(X_1\)-variable is 0 \(X_1\)-units and \(X_2\)-variable is reference group (cat 1) (95% CI: LB, UB).
We will use: \(x_{1a}\) and \(x_{1b} = x_{1a} + 1\), with the implication that \(\Delta{x_1} = x_{1b} - x_{1a} = 1\)
Our goal is to get to a statement with \(\widehat{\beta}_1\) alone:
\[\begin{aligned} \widehat{Y}|x_{1a} = &\widehat{\beta}_0 + \widehat{\beta}_1 x_{1a} + \widehat{\beta}_2 I(X_2=\text{``cat 2"}) + \widehat{\beta}_3 I(X_2=\text{``cat 3"})\\ \widehat{Y}|x_{1b} = &\widehat{\beta}_0 + \widehat{\beta}_1 x_{1b} + \widehat{\beta}_2 I(X_2=\text{``cat 2"}) + \widehat{\beta}_3 I(X_2=\text{``cat 3"})\\ \widehat{Y}|x_{1b} - \widehat{Y}|x_{1a} = & \bigg[\widehat{\beta}_0 + \widehat{\beta}_1 x_{1b} + \widehat{\beta}_2 I(X_2=\text{``cat 2"}) + \widehat{\beta}_3 I(X_2=\text{``cat 3"})\bigg] \\ &- \bigg[\widehat{\beta}_0 + \widehat{\beta}_1 x_{1a} + \widehat{\beta}_2 I(X_2=\text{``cat 2"}) + \widehat{\beta}_3 I(X_2=\text{``cat 3"})\bigg] \\ \widehat{Y}|x_{1b} - \widehat{Y}|x_{1a} = & \widehat{\beta}_1 x_{1b} - \widehat{\beta}_1 x_{1a}\\ \widehat{Y}|x_{1b} - \widehat{Y}|x_{1a} = & \widehat{\beta}_1 (x_{1b} - x_{1a})\\ \widehat{Y}|x_{1b} - \widehat{Y}|x_{1a} = & \widehat{\beta}_1\\ \end{aligned}\]
As long as \(X_2\) is the category (aka adjusting or controlling for \(X_2\)), then any terms with \(X_2\) will cancel out
Interpretation: For every increase of 1 \(X_1\)-unit in the \(X_1\)-variable, adjusting/controlling for \(X_2\)-variable, there is an expected increase/decrease of \(|\widehat\beta_1|\) units in the \(Y\)-variable (95%: LB, UB).
We can so the same for \(X_2\): \(x_{2a}\) is the reference group and \(x_{2b}\) is category 2
Our goal is to get to a statement with \(\widehat{\beta}_2\) alone:
\[\begin{aligned} \widehat{Y}|x_{2a} = &\widehat{\beta}_0 + \widehat{\beta}_1 X_1 + \widehat{\beta}_2 \times 0 + \widehat{\beta}_3 \times 0\\ \widehat{Y}|x_{2b} = &\widehat{\beta}_0 + \widehat{\beta}_1 X_1 + \widehat{\beta}_2 \times 1 + \widehat{\beta}_3 \times 0\\ \widehat{Y}|x_{2b} - \widehat{Y}|x_{2a} = & \bigg[\widehat{\beta}_0 + \widehat{\beta}_1 X_1 + \widehat{\beta}_2 \bigg] - \bigg[\widehat{\beta}_0 + \widehat{\beta}_1 X_1\bigg] \\ \widehat{Y}|x_{2b} - \widehat{Y}|x_{2a} = & \widehat{\beta}_2\\ \end{aligned}\]
As long as \(X_1\) is the same value (aka adjusting or controlling for \(X_1\)), then the two \(\widehat{\beta}_1X_1\) terms will cancel out
Interpretation: Adjusting/controlling for \(X_1\)-variable, the difference in mean \(Y\)-variable comparing \(X_2\)-variable in category 2 to the reference group (cat 1) is \(|\widehat\beta_2|\) units (95%: LB, UB).
Fitted multiple regression model:
\[\begin{aligned} \widehat{\text{LE}} = & \widehat{\beta}_0 + \widehat{\beta}_1 \text{CP} + \widehat{\beta}_2 I(IL = \text{``Lower middle"}) + \widehat{\beta}_3 I(IL = \text{``Upper middle"}) + \widehat{\beta}_2 I(IL = \text{``High"}) \\ \widehat{\text{LE}} = & 62.25 + 0.023 \ \text{CP} + 3.52\ I(IL = \text{``Lower middle"}) + 7.29\ I(IL = \text{``Upper middle"}) + \\ & 13.34\ I(IL = \text{``High"}) \end{aligned}\]
Interpretation for \(\widehat{\beta}_0\)
The average life expectancy is 62.25 years for a country with 0 cell phones per 100 people and low income status (95% CI: 58.7, 65.81).
Interpretation for \(\widehat{\beta}_1\)
For every increase of 1 cell phone per 100 people, there is an expected increase of 0.023 years in life expectancy (95% CI: -0.014, 0.06), adjusting for income level.
Interpretation for \(\widehat{\beta}_2\)
The difference in average life expectancy comparing lower middle income countries to low income countries is 3.52 years (95% CI: 0.11, 6.93), adjusting for cell phones per 100 people.
R) with one continuous and one categorical predictor.R) with two continuous predictors.Simple linear regression population model
\[\begin{aligned} \text{LE} = \beta_0 + \beta_1 \text{CP} + \epsilon \end{aligned}\]
Multiple linear regression population model (with added vaccination rate)
\[\begin{aligned} \text{LE} = \beta_0 + \beta_1 \text{CP} + \beta_2 \text{VR} + \epsilon \end{aligned}\]
New population model for example:
\[\text{LE} = \beta_0 + \beta_1 \text{CP} + \beta_2 \text{VR} + \epsilon\]
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 46.833 | 6.042 | 7.751 | 0.000 | 34.848 | 58.818 |
| cell_phones_100 | 0.075 | 0.018 | 4.074 | 0.000 | 0.039 | 0.112 |
| vax_rate | 0.168 | 0.073 | 2.318 | 0.022 | 0.024 | 0.312 |
Fitted multiple regression model:
\[\begin{aligned} \widehat{\text{LE}} &= \widehat{\beta}_0 + \widehat{\beta}_1 \text{CP} + \widehat{\beta}_2 \text{VR} \\ \widehat{\text{LE}} &= 46.833 + 0.075 \ \text{CP} + 0.168\ \text{VR} \end{aligned}\]
.qmd- file. I hid it from view in the html file.\[\begin{aligned} \widehat{\text{LE}} &= \widehat{\beta}_0 + \widehat{\beta}_1 \text{CP} + \widehat{\beta}_2 \text{VR} \\ \widehat{\text{LE}} &= 46.833 + 0.075 \ \text{CP} + 0.168\ \text{VR} \end{aligned}\]
\[\begin{aligned} \widehat{\text{LE}} &= 46.833 + 0.075 \ \text{CP} + 0.168\ \text{VR}\\ \widehat{\text{LE}} &= 46.833 + 0.075 \ \text{CP} + 0.168\cdot 80\\ \widehat{\text{LE}} &= 46.833 + 0.075 \ \text{CP} + 13.463 \\ \widehat{\text{LE}} &= 60.297 + 0.075 \ \text{CP} \end{aligned}\]
R) with one continuous and one categorical predictor.R) with two continuous predictors.General interpretation for \(\widehat{\beta}_0\)
The expected \(Y\)-variable is (\(\widehat\beta_0\) units) when the \(X_1\)-variable is 0 \(X_1\)-units and \(X_2\)-variable is 0 \(X_1\)-units (95% CI: LB, UB).
General interpretation for \(\widehat{\beta}_1\)
For every increase of 1 \(X_1\)-unit in the \(X_1\)-variable, adjusting/controlling for \(X_2\)-variable, there is an expected increase/decrease of \(|\widehat\beta_1|\) units in the \(Y\)-variable (95%: LB, UB).
General interpretation for \(\widehat{\beta}_2\)
For every increase of 1 \(X_2\)-unit in the \(X_2\)-variable, adjusting/controlling for \(X_1\)-variable, there is an expected increase/decrease of \(|\widehat\beta_2|\) units in the \(Y\)-variable (95%: LB, UB).
\[\begin{aligned} \widehat{Y} = &\widehat{\beta}_0 + \widehat{\beta}_1 0 + \widehat{\beta}_2 0\\ \widehat{Y} = &\widehat{\beta}_0 \\ \end{aligned}\]
Interpretation: The expected \(Y\)-variable is (\(\widehat\beta_0\) units) when the \(X_1\)-variable is 0 \(X_1\)-units and \(X_2\)-variable is 0 \(X_1\)-units (95% CI: LB, UB).
We will use: \(x_{1a}\) and \(x_{1b} = x_{1a} + 1\), with the implication that \(\Delta{x_1} = x_{1b} - x_{1a} = 1\)
Our goal is to get to a statement with \(\widehat{\beta}_1\) alone:
\[\begin{aligned} \widehat{Y}|x_{1a} = &\widehat{\beta}_0 + \widehat{\beta}_1 x_{1a} + \widehat{\beta}_2 X_2\\ \widehat{Y}|x_{1b} = &\widehat{\beta}_0 + \widehat{\beta}_1 x_{1b} + \widehat{\beta}_2 X_2\\ \widehat{Y}|x_{1b} - \widehat{Y}|x_{1a} = & \bigg[\widehat{\beta}_0 + \widehat{\beta}_1 x_{1b} + \widehat{\beta}_2 X_2\bigg] - \bigg[\widehat{\beta}_0 + \widehat{\beta}_1 x_{1a} + \widehat{\beta}_2 X_2\bigg] \\ \widehat{Y}|x_{1b} - \widehat{Y}|x_{1a} = & \widehat{\beta}_1 x_{1b} - \widehat{\beta}_1 x_{1a}\\ \widehat{Y}|x_{1b} - \widehat{Y}|x_{1a} = & \widehat{\beta}_1 (x_{1b} - x_{1a})\\ \widehat{Y}|x_{1b} - \widehat{Y}|x_{1a} = & \widehat{\beta}_1\\ \end{aligned}\]
As long as \(X_2\) is the same value (aka adjusting or controlling for \(X_2\)), then the two \(\widehat{\beta}_2X_2\) terms will cancel out
Interpretation: For every increase of 1 \(X_1\)-unit in the \(X_1\)-variable, adjusting/controlling for \(X_2\)-variable, there is an expected increase/decrease of \(|\widehat\beta_1|\) units in the \(Y\)-variable (95%: LB, UB).
We can so the same for \(X_2\): \(x_{2a}\) and \(x_{2b} = x_{2a} + 1\), with the implication that \(\Delta{x_2} = x_{2b} - x_{2a} = 1\)
Our goal is to get to a statement with \(\widehat{\beta}_2\) alone:
\[\begin{aligned} \widehat{Y}|x_{2a} = &\widehat{\beta}_0 + \widehat{\beta}_1 X_1 + \widehat{\beta}_2 x_{2a}\\ \widehat{Y}|x_{2b} = &\widehat{\beta}_0 + \widehat{\beta}_1 X_1 + \widehat{\beta}_2 x_{2b}\\ \widehat{Y}|x_{2b} - \widehat{Y}|x_{2a} = & \bigg[\widehat{\beta}_0 + \widehat{\beta}_1 X_1 + \widehat{\beta}_2 x_{2b} \bigg] - \bigg[\widehat{\beta}_0 + \widehat{\beta}_1 X_1 + \widehat{\beta}_2 x_{2a}\bigg] \\ \widehat{Y}|x_{2b} - \widehat{Y}|x_{2a} = & \widehat{\beta}_2 x_{2b} - \widehat{\beta}_2 x_{2a}\\ \widehat{Y}|x_{2b} - \widehat{Y}|x_{2a} = & \widehat{\beta}_2 (x_{2b} - x_{2a})\\ \widehat{Y}|x_{2b} - \widehat{Y}|x_{2a} = & \widehat{\beta}_2\\ \end{aligned}\]
As long as \(X_1\) is the same value (aka adjusting or controlling for \(X_1\)), then the two \(\widehat{\beta}_1X_1\) terms will cancel out
Interpretation: For every increase of 1 \(X_2\)-unit in the \(X_2\)-variable, adjusting/controlling for \(X_1\)-variable, there is an expected increase/decrease of \(|\widehat\beta_2|\) units in the \(Y\)-variable (95%: LB, UB).
We fit the regression model in R and printed the regression table:
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 46.833 | 6.042 | 7.751 | 0.000 | 34.848 | 58.818 |
| cell_phones_100 | 0.075 | 0.018 | 4.074 | 0.000 | 0.039 | 0.112 |
| vax_rate | 0.168 | 0.073 | 2.318 | 0.022 | 0.024 | 0.312 |
Fitted multiple regression model: \(\widehat{\text{LE}} = 46.833 + 0.075 \ \text{CP} + 0.168\ \text{VR}\)
Interpretation for \(\widehat{\beta}_0\)
The average life expectancy is 46.83 years for a country with 0 cell phones per 100 people and 0% vaccination rate (95% CI: 34.85, 58.82).
Interpretation for \(\widehat{\beta}_1\)
For every increase of 1 cell phone per 100 people, there is an expected increase of 0.08 years in a country’s life expectancy (95% CI: 0.04, 0.11), adjusting for vaccination rate.
Interpretation for \(\widehat{\beta}_2\)
For every 1% increase in vaccination rate, there is an expected increase of 0.17 years in a country’s life expectancy (95% CI: 0.02, 0.31), adjusting for cell phones per 100 people.
Units of Y
Units of X
If discussing intercept: Mean or average or expected before Y
If discussing coefficient for continuous covariate: Mean or average or expected before difference, increase, or decrease
Confidence interval
If other covariates in the model
Discussing intercept: Must state that variables are equal to 0
Discussing coefficient for covariate: Must state “adjusting for all other variables”, “Controlling for all other variables”, or “Holding all other variables constant”
The equations for calculating the \(\boldsymbol{\widehat{\beta}}\) values is best done using matrix notation (not required for our class)
We will be using R to get the coefficients instead of the equation (already did this a few slides back!)
How we have represented the population regression model: \[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2+ \ldots + \beta_k X_k + \epsilon\]
\[\begin{aligned} \boldsymbol{Y} &= \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \\ \boldsymbol{Y}_{n \times 1}& = \boldsymbol{X}_{n \times (k+1)}\boldsymbol{\beta}_{(k+1)\times 1} + \boldsymbol{\epsilon}_{n \times 1} \end{aligned}\]
\[ \boldsymbol{Y} = \left[\begin{array}{c} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{array} \right]_{n \times 1} \] \[ \boldsymbol{\epsilon} = \left[\begin{array}{c} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{array} \right]_{n \times 1} \]
\[ \boldsymbol{X} = \left[ \begin{array}{ccccc} 1 & X_{11} & X_{12} & \ldots & X_{1,k} \\ 1 &X_{21} & X_{22} & \ldots & X_{2,k} \\ \vdots&\vdots & \vdots & \ldots & \vdots \\ 1 & X_{n1} & X_{n2} & \ldots & X_{n,k} \end{array} \right]_{n \times (k+1)} \]
\[ \boldsymbol{\beta} = \left[\begin{array}{c} \beta_0 \\ \beta_1\\ \vdots \\ \beta_{k} \end{array} \right]_{(k+1)\times 1} \]
Lesson 9: MLR Intro