8.7 Multiple regression: Theory

If all of that stuff on the previous page made sense then great! You’re now ready to tackle multiple regressions, which are an extension of the simple linear regression. You’ll see that much of the same stuff applies here, but a few things change…

8.7.1 Multiple regression

Multiple regression is used when we want to test multiple predictors against an outcome variable. It’d be a safe bet to say that multiple regression and its various forms are probably one of the most used statistical tests in music psychology literature as a whole - you’ll see them everywhere! No introduction to linear regression would really be complete without at least scratching the surface of multiple regression.

The name “multiple regression” is actually a fairly generic term in some respects, as it describes any instance of regression with two or more predictors. I say that because there are several forms of multiple regression, such as:

  • Standard multiple regression, which we will cover in this subject
  • Hierarchical multiple regression, where you split your analysis into blocks
  • Stepwise multiple regression, where algorithms attempt to select the best predictors
  • And more!

8.7.2 The regression equation, once again

Recall that in a simple linear regression, we had this formula to describe the line of best fit:

\[ y = \beta_0 + \beta_1x + \epsilon_i \]

In a multiple regression, we work with an extension of this formula. The key part here is the part of the equation labelled \(\beta_1x\) - this is the part of the equation that relates to the individual predictor and its slope (i.e. how it predicts the outcome). We extend this in a multiple regression. For example, say we now had two predictors:

\[ y = \beta_0 + \beta_1x_1 +\beta_2x_2 + \epsilon_i \]

We now have a term for our first predictor \(\beta_1x_1\) and for our second: \(\beta_2x_2\).

  • x1 and x2 simply mean predictor 1 and predictor 2.
  • The betas here are still regression slopes; the subscript numbers just indicate which predictor they correspond to.

From here on, much of the same reasoning that we saw in the early pages of this module apply. The primary hypotheses are now about whether each slope is significantly different from zero - which would indicate whether each predictor does significantly predict the outcome.

8.7.3 Assumption testing in multiple regressions

All of the following assumption tests apply:

  • Linearity
  • Independence
  • Homoscedasticity
  • Normality
  • Multicollinearity

We’ll test these on the next page!