11.4 Pseudo \(R^2\)

11.4.1 Regular \(R^2\)

Recall that in a linear regression, we often talk about \(R^2\), or the coefficient of determination. We interpret this value as the amount of variance that is explained in our outcome by our predictor in the regression (or all of our predictors, in the case of multiple regression).

As a reminder, here is the output for the regression on flow proneness in the regression module:

w10_flow_lm <- lm(DFS_Total ~ trait_anxiety + openness, data = w10_flow)
summary(w10_flow_lm)

## 
## Call:
## lm(formula = DFS_Total ~ trait_anxiety + openness, data = w10_flow)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.596  -2.331  -0.151   2.308  12.794 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   32.65078    1.13377  28.798  < 2e-16 ***
## trait_anxiety -0.11116    0.01278  -8.697  < 2e-16 ***
## openness       0.83372    0.14538   5.735 1.38e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.705 on 808 degrees of freedom
## Multiple R-squared:  0.1364, Adjusted R-squared:  0.1343 
## F-statistic: 63.81 on 2 and 808 DF,  p-value: < 2.2e-16

As we can see, our value for \(R^2\) is .1364, meaning that 13.64% of the variance in our outcome, flow proneness, can be explained by our predictors trait anxiety and openness. The mathematical properties of least squares regression allow us to derive this value and interpret it fairly cleanly and easily. We can even compare \(R^2\) across similar models, or (in the hierarchical case) use it to directly compare model fit.

In logistic regression, however, we cannot calculate the same value as we no longer use ordinary least squares regression. Instead, we use maximum likelihood, which provides a different way of calculating the various parameters (i.e. coefficients) in our model. Maximum likelihood methods in the context of logistic regression don’t really give us the same ‘clean’ and easily interpretable \(R^2\) as we get in normal regressions, because we don’t operate under the same method of minimising residuals. Rather, these \(R^2\) methods are calculated using each model’s likelihood, \(\hat L\).

To partially overcome this, several measures of pseudo \(R^2\) have been developed. The word ‘pseudo’ is important here, as it’s important to acknowledge that these are not quite the same thing as our usual \(R^2\). We can’t use them in the same way to directly compare across models, for instance - each pseudo-\(R^2\) has its own suggested interpretation, and thus do not always cohere with each other. The wonderful Statistical Methods and Data Analysis group at UCLA have a great explanation, with formulae for several pseudo-\(R^2\) measures. We will focus on three in this module, mainly for parity with Jamovi (but they also happen to be the most popular).

11.4.2 Calculating pseudo-\(R^2\) measures

As a reminder, let’s print the output from our logistic regression on sleep:

summary(sleep_glm)

## 
## Call:
## glm(formula = sleep_disorder ~ age, family = binomial, data = sleep)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.28024    0.65343  -8.081 6.44e-16 ***
## age          0.11549    0.01493   7.738 1.01e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 507.47  on 373  degrees of freedom
## Residual deviance: 433.36  on 372  degrees of freedom
## AIC: 437.36
## 
## Number of Fisher Scoring iterations: 4

The DescTools package provides a very convenient function, PseudoR2(), to calculate these pseudo-\(R^2\) measures for us. This function requires a) the name of the glm model (i.e. our logistic regression object), and b) a character specifying which type(s) of \(R^2\) should be calculated. You’ll see this in the examples below.

McFadden’s \(R^2\) is roughly analogous to a regular \(R^2\), in that it is intended to give an estimate of how much total variability is explained by the logistic model. It is calculated by comparing the fit of a logistic model against a null (i.e. no predictor) model.

library(DescTools)
PseudoR2(sleep_glm, which = "McFadden")

##  McFadden 
## 0.1460373

Cox and Snell’s \(R^2\) is also calculated by comparing a full model to an null/no predictor model. The underlying calculation, however, is different, and a particular oddity of the Cox and Snell \(R^2\) is that the maximum possible value is less than 1.

PseudoR2(sleep_glm, which = "CoxSnell")

##  CoxSnell 
## 0.1797558

Nagelkerke’s \(R^2\) is an adjustment of the Cox and Snell \(R^2\) - specifically, it adjusts the value of \(R^2\) so that it ranges from 0-1.

PseudoR2(sleep_glm, which = "Nagelkerke")

## Nagelkerke 
##  0.2420843

Finally, Tjur’s \(R^2\) is a relatively new pseudo-\(R^2\). It is calculated by first calculating the average predicted probabilities of the outcomes, and then taking the differences between those two probabilities. It is bounded between 0-1 and is also roughly analogous to a normal \(R^2\).

PseudoR2(sleep_glm, which = "Tjur")

##      Tjur 
## 0.1888425

11.4.3 When do you report it?

Truthfully, as the UCLA help page states, pseudo-\(R^2\) methods are not as useful or as cleanly interpretable as normal OLS-based \(R^2\). They are only useful when comparing models using the same pseudo-\(R^2\) value, using the same data and variables. In other words, these measures are useful to select between competing models on the same data; they are not valid for comparing across datasets.

Another thing to consider is that different measures perform differently. Simulation studies (e.g. Veall and Zimmermann 1992) have shown that Nagelkerke and McFadden’s \(R^2\) both severely underestimate the ‘true’ value of \(R^2\). Other methods exist, which can be calculated with PseudoR2(), but are not as widely implemented.

Some will argue that \(R^2\) values are pointless outside of the model selection context. Others will say that it never hurts to report them anyway even if you are reporting just the one model. The decision, ultimately, is probably best left to you as the researcher to figure out what is most appropriate for what you are doing.