10.2 Hierarchical regression

Hierarchical regression is a form of multiple regression where we test the effects of predictors in blocks. The aim of doing a hierarchical regression is generally to test theoretical predictions about the effects of specific variables, especially before/after we control for other variables. The other aim is to explore how the model changes after we add additional predictors into the model.

The basic principle of a hierarchical regression is something like this:

Start by defining block 1, which is our basic regression model. This is the regression we start with. Run the regression defined in block 1.
Identify which variables will be entered into block 2, which is the first round of additional predictors
Run a second multiple regression with all predictors in block 2.
Compare block 1 with block 2 in terms of overall model fit.

The choice of what variables to enter in which blocks must be guided by theory - in other words, you cannot simply add variables at random.

10.2.1 Example

Let’s return to the proneness to flow example introduced in the multiple regression section. As a reminder, here are our variables:

Trait anxiety: broadly, refers to people’s tendency to feel anxious
Openness to experience: a personality trait that describes how likely people are to seek new experiences
DFS_Total: a measure of proneness to flow.
age: participant’s age.

flow_data <- read_csv(here("data", "week_10", "w10_flow.csv"))

## Rows: 811 Columns: 6
## ── Column specification ────────────────────────
## Delimiter: ","
## dbl (6): id, age, GoldMSI, DFS_Total, trait_anxiety, openness
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

In the first regressions module, we simply ran everything in one go as a multiple regression. Now let’s imagine we want to run this as a hierarchical regression, with the following blocks:

Block 1: GOld MSI predicting proneness to flow (DFS_Total)
Block 2: Gold MSI and openness predicting proneness to flow
Block 3: Gold MSI, openness and trait anxiety predicting proneness to flow

The assumption tests in multiple regressions are identical for hierarchical regressions.

10.2.2 Building blocks and output

Let’s start by building block 1. We can do this with lm() as per normal. I will call this flow_block1:

flow_block1 <- lm(DFS_Total ~ GoldMSI, data = flow_data)

To build block 2, we simply need to create a new regression model with both predictors, as if we were running this in one go:

flow_block2 <- lm(DFS_Total ~ GoldMSI + openness, data = flow_data)

Finally, we do the same thing for block 3:

flow_block3 <- lm(DFS_Total ~ GoldMSI + openness + trait_anxiety, data = flow_data)

Now let’s print the summary of each model. We can see in block 1 that Gold MSI scores significantly predict proneness to flow:

summary(flow_block1)

## 
## Call:
## lm(formula = DFS_Total ~ GoldMSI, data = flow_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.1367  -2.4567   0.0448   2.2783  12.2886 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  16.3429     1.0848   15.06   <2e-16 ***
## GoldMSI       2.9231     0.1983   14.74   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.538 on 809 degrees of freedom
## Multiple R-squared:  0.2118, Adjusted R-squared:  0.2108 
## F-statistic: 217.4 on 1 and 809 DF,  p-value: < 2.2e-16

In Block 2, both the Gold MSI and openness are significant predictors of flow proneness.

summary(flow_block2)

## 
## Call:
## lm(formula = DFS_Total ~ GoldMSI + openness, data = flow_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.3099  -2.3925   0.0643   2.2613  11.6213 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  14.6287     1.1912  12.281  < 2e-16 ***
## GoldMSI       2.7179     0.2061  13.185  < 2e-16 ***
## openness      0.4818     0.1425   3.382 0.000755 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.515 on 808 degrees of freedom
## Multiple R-squared:  0.2228, Adjusted R-squared:  0.2209 
## F-statistic: 115.8 on 2 and 808 DF,  p-value: < 2.2e-16

Finally, in block 3 we can see that all three remain significant predictors. However, the effect of openness to experience has changed slightly (an unreliable heuristic for this is that the p-value has increased):

summary(flow_block3)

## 
## Call:
## lm(formula = DFS_Total ~ GoldMSI + openness + trait_anxiety, 
##     data = flow_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.0424  -2.2409  -0.0931   2.1484  12.3474 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   21.08246    1.33150  15.834   <2e-16 ***
## GoldMSI        2.66545    0.19623  13.583   <2e-16 ***
## openness       0.29958    0.13700   2.187   0.0291 *  
## trait_anxiety -0.10662    0.01154  -9.237   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.345 on 807 degrees of freedom
## Multiple R-squared:  0.2971, Adjusted R-squared:  0.2945 
## F-statistic: 113.7 on 3 and 807 DF,  p-value: < 2.2e-16

On the next page we’ll talk about model comparison in a more formal manner. However, if we wanted to write these results up we would need to talk about the results from each block. For example:

A hierarchical regression was conducted to examine the effect