10.3 Comparing models
On the previous page, we ended up with three models relating to the flow data. That’s all well and good, and seeing how each model changed the predictors was valuable in its own right. But how do we actually… decide which model to run with?
10.3.1 Comparing \(R^2\)
The most commonly cited method of comparing between regression models is to examine their \(R^2\) values, which you may recall is a measure of how much variance in the outcome is explained by the predictors.
This is easy enough to do visually. You can see the \(R^2\) values in the output. We can extract this easily using the following code. Whenever we use summary() on an lm() model, the summary object will contain a variable for the \(R^2\) that we can easily pull:
## [1] 0.211779
## [1] 0.22278
## [1] 0.2971018
We can see that Block 3 has the highest \(R^2\) at .297, meaning that the Block 3 model explains about 29.7% of the variance in the outcome. Block 2 explains 22.3% while Block 1 explains 21.2%. Therefore, based on this alone we might say that Block 2 explains only a little bit of extra variance in flow proneness than Block 1, while Block 3 explains substantially more - therefore, we should go with Block 3. However… \(R^2\) will always increase with more predictors! The very fact that each additional predictor will explain more variance - even if only a tiny amount at a time - means that selecting based on \(R^2\) alone will naturally favour models with more predictors. This isn’t necessarily a useful thing!
10.3.2 Nested model tests
This is a slightly more ‘formal’ test of whether a more complex model leads to a significant change in fit. This works by comparing nested models. Imagine model A and model B, two linear regressions fit on the same dataset. Model A has three predictors, and is the ‘full’ model of the thing we’re trying to the estimate. Model B drops one of the predictors from Model A, but keeps the other two. Model B is considered a nested model of Model A.
The principle of this test is based on the idea of seeing whether a nested (reduced) model is a significantly better fit than a full model. If a nested model is a better fit, the residual sums of squares will decrease - less residuals indicate better fit. The anova() test works on this principle, but in a sort of reverse way. Because we’re testing whether a model with additional predictors is a better fit, naturally we should expect that the ‘nested’ model (in this case, our original model) will be a worse fit than the new model. In that case, a significant result indicates that the more complex model is the better fit.
Nested model tests can be done with the anova() function from base R by simply giving it two model names in order. Let’s start by comparing Blocks 1 and 2:
And now between Blocks 2 and 3:
From this, we can conclude that Block 2 is a better fit to the data than Block 1 (F(1, 808) = 11.437, p < .001), and also that Block 3 is again a better fit than Block 2 (F(1, 807) = 85.329, p < .001). Therefore, using this method we would consider using the model in Block 3 for interpretation, as this provides a better fit of the data. This sort of lines up with what we saw with the \(R^2\) change (but this probably won’t always be the case).
10.3.3 Fit indices
An alternative approach is to use fit indices, which are various measures that essentially indicate how well a model fits the data. Importantly, unlike \(R^2\) these measures penalise based on the complexity of the model - i.e. models with more predictors are penalised more due to their complexity.
Two of the most widely used fit indices are the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). They work similarly, but are just calculated in slightly different ways.
The AIC and BIC are calculated by:
\[ AIC = 2k -2 ln(\hat L) \]
\[ BIC = k ln(n) - 2 ln (\hat L) \]
\(\hat L\) is called the likelihood, which is a whole thing that we won’t dive too much into. However, \(2 ln (\hat L)\) - or -2LL, or minus two log likelihood - goes by the name of deviance (as in deviation). Deviance is essentially the residual sum of squares, and thus serves as a measure of model fit.
R provides some really neat functions called - you guessed it - AIC() and BIC(). These will calculate the AIC and BIC values for every model name you give it. So, we can enter all of our values at once: