10.2 Hierarchical regression
Hierarchical regression is a form of multiple regression where we test the effects of predictors in blocks. The aim of doing a hierarchical regression is generally to test theoretical predictions about the effects of specific variables, especially before/after we control for other variables. The other aim is to explore how the model changes after we add additional predictors into the model.
The basic principle of a hierarchical regression is something like this:
- Start by defining block 1, which is our basic regression model. This is the regression we start with. Run the regression defined in block 1.
- Identify which variables will be entered into block 2, which is the first round of additional predictors
- Run a second multiple regression with all predictors in block 2.
- Compare block 1 with block 2 in terms of overall model fit.
The choice of what variables to enter in which blocks must be guided by theory - in other words, you cannot simply add variables at random.
10.2.1 Example
Let’s return to the proneness to flow example introduced in the multiple regression section. As a reminder, here are our variables:
- Trait anxiety: broadly, refers to people’s tendency to feel anxious
- Openness to experience: a personality trait that describes how likely people are to seek new experiences
- DFS_Total: a measure of proneness to flow.
- age: participant’s age.
## Rows: 811 Columns: 6
## ── Column specification ────────────────────────
## Delimiter: ","
## dbl (6): id, age, GoldMSI, DFS_Total, trait_anxiety, openness
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
In the first regressions module, we simply ran everything in one go as a multiple regression. Now let’s imagine we want to run this as a hierarchical regression, with the following blocks:
- Block 1: GOld MSI predicting proneness to flow (DFS_Total)
- Block 2: Gold MSI and openness predicting proneness to flow
- Block 3: Gold MSI, openness and trait anxiety predicting proneness to flow
The assumption tests in multiple regressions are identical for hierarchical regressions.
10.2.2 Building blocks and output
Let’s start by building block 1. We can do this with lm() as per normal. I will call this flow_block1:
To build block 2, we simply need to create a new regression model with both predictors, as if we were running this in one go:
Finally, we do the same thing for block 3:
Now let’s print the summary of each model. We can see in block 1 that Gold MSI scores significantly predict proneness to flow:
##
## Call:
## lm(formula = DFS_Total ~ GoldMSI, data = flow_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.1367 -2.4567 0.0448 2.2783 12.2886
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.3429 1.0848 15.06 <2e-16 ***
## GoldMSI 2.9231 0.1983 14.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.538 on 809 degrees of freedom
## Multiple R-squared: 0.2118, Adjusted R-squared: 0.2108
## F-statistic: 217.4 on 1 and 809 DF, p-value: < 2.2e-16
In Block 2, both the Gold MSI and openness are significant predictors of flow proneness.
##
## Call:
## lm(formula = DFS_Total ~ GoldMSI + openness, data = flow_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.3099 -2.3925 0.0643 2.2613 11.6213
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.6287 1.1912 12.281 < 2e-16 ***
## GoldMSI 2.7179 0.2061 13.185 < 2e-16 ***
## openness 0.4818 0.1425 3.382 0.000755 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.515 on 808 degrees of freedom
## Multiple R-squared: 0.2228, Adjusted R-squared: 0.2209
## F-statistic: 115.8 on 2 and 808 DF, p-value: < 2.2e-16
Finally, in block 3 we can see that all three remain significant predictors. However, the effect of openness to experience has changed slightly (an unreliable heuristic for this is that the p-value has increased):
##
## Call:
## lm(formula = DFS_Total ~ GoldMSI + openness + trait_anxiety,
## data = flow_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.0424 -2.2409 -0.0931 2.1484 12.3474
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.08246 1.33150 15.834 <2e-16 ***
## GoldMSI 2.66545 0.19623 13.583 <2e-16 ***
## openness 0.29958 0.13700 2.187 0.0291 *
## trait_anxiety -0.10662 0.01154 -9.237 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.345 on 807 degrees of freedom
## Multiple R-squared: 0.2971, Adjusted R-squared: 0.2945
## F-statistic: 113.7 on 3 and 807 DF, p-value: < 2.2e-16
On the next page we’ll talk about model comparison in a more formal manner. However, if we wanted to write these results up we would need to talk about the results from each block. For example:
A hierarchical regression was conducted to examine the effect