12.3 Initial considerations for EFA

We’ll start with some basic considerations for EFA/PCA. These are generally things that should be thought about/considered before an EFA, or at least before you interpret the results.

12.3.1 Sample size

For adequate power, EFA typically needs a fairly big sample size. There is no clear agreement about what constitutes a ‘good’ sample size, and it’s difficult to give concrete recommendations.

Many guides and sources will often mention a n:p rule of thumb, where n is sample size and p is number of variables (which is the number of parameters that needs to be estimated). The idea is that an ideal EFA sample size will have n participants for every variable you are analysing. These can range from as low as 3:1 to 20:1, with a typical ‘ok’ range being from 10-20:1. However, there is no clear support for these rules, and no minimum is truly sufficient (Hogarty et al. 2005).

In general - the bigger the better, and the more variables you have the more participants you need. Aim for at least 300+ no matter the circumstance (this is a very blunt rule of thumb!).

Below are our descriptives. While we can use a bit of tidyverse to get descriptives, the describe() function from the psych package is also convenient for getting basic descriptives for every column in a dataset. We can see that we have n = 2571, which should be more than adequate.

describe(saq)

12.3.2 Assumptions

EFA can be conducted with one of multiple algorithms that determines the final factor structure to be extracted. Different methods rely on different assumptions, so basic assumption checks are useful:

  • Data must be interval/ratio data. Ordinal data is problematic unless you generally have at least 5 scale points; in which case, you can broadly approximate this to be continuous.
  • Normality is important, depending on the method. The usual QQ-plot or Shapiro-Wilks tests on individual items can be useful here.a
  • Multicollinearity: observed variables should not be collinear with each other.

Note though that this only tests the normality of one variable, i.e. univariate normality. EFA is ideal with multivariate normality, i.e. the joint dimensions of the entire dataset are normally distributed. Univariate normality is necessary but not sufficient for multivariate normality. Sadly, however, Jamovi doesn’t give an easy way to test multivariate normality so let’s run with univariate normality for now.

12.3.3 Factorability

Factorability broadly describes whether the data are likely to be amenable to factor analysis. If data are factorable, it suggests that there is likely to be at least one latent factor underlying the observations.

We can test factorability in three ways:

  1. Correlations

A simple matrix of correlations can give us a first-pass indication of factorability. If most items correlate with each other, this can indicate that there are underlying latent factors. There is no hard and fast rule for what counts as ‘acceptable’, but if most variables are not significantly correlated that indicates that the data may not be factorable. In our SAQ-9 data, we can see that all correlations between variables are significant, which is generally a good sign.

cor(saq)
##             q01         q02         q04        q05         q06        q14
## q01  1.00000000 -0.09872403  0.43586018  0.4024399  0.21673399  0.3378797
## q02 -0.09872403  1.00000000 -0.11185965 -0.1193466 -0.07420968 -0.1646999
## q04  0.43586018 -0.11185965  1.00000000  0.4006722  0.27820154  0.3508096
## q05  0.40243992 -0.11934658  0.40067225  1.0000000  0.25746014  0.3153381
## q06  0.21673399 -0.07420968  0.27820154  0.2574601  1.00000000  0.4022441
## q14  0.33787966 -0.16469991  0.35080964  0.3153381  0.40224407  1.0000000
## q15  0.24575263 -0.16499581  0.33423089  0.2613719  0.35989309  0.3801148
## q19 -0.18901103  0.20329748 -0.18597751 -0.1653221 -0.16675017 -0.2540581
## q22 -0.10440866  0.23087487 -0.09838349 -0.1325359 -0.16513541 -0.1698375
##            q15        q19         q22
## q01  0.2457526 -0.1890110 -0.10440866
## q02 -0.1649958  0.2032975  0.23087487
## q04  0.3342309 -0.1859775 -0.09838349
## q05  0.2613719 -0.1653221 -0.13253593
## q06  0.3598931 -0.1667502 -0.16513541
## q14  0.3801148 -0.2540581 -0.16983754
## q15  1.0000000 -0.2098023 -0.16790617
## q19 -0.2098023  1.0000000  0.23392259
## q22 -0.1679062  0.2339226  1.00000000
# Alternatively, the lowerCor() fucntion from psych prints this more nicely

lowerCor(saq)
##     q01   q02   q04   q05   q06   q14   q15   q19   q22  
## q01  1.00                                                
## q02 -0.10  1.00                                          
## q04  0.44 -0.11  1.00                                    
## q05  0.40 -0.12  0.40  1.00                              
## q06  0.22 -0.07  0.28  0.26  1.00                        
## q14  0.34 -0.16  0.35  0.32  0.40  1.00                  
## q15  0.25 -0.16  0.33  0.26  0.36  0.38  1.00            
## q19 -0.19  0.20 -0.19 -0.17 -0.17 -0.25 -0.21  1.00      
## q22 -0.10  0.23 -0.10 -0.13 -0.17 -0.17 -0.17  0.23  1.00
  1. Bartlett’s test of sphericity

Bartlett’s test of sphericity tests the null hypothesis that all correlations between variables are zero at the population level. In other words, if Bartlett’s test is non-significant it suggests that all of the indicator variables are not correlated. As a result, Bartlett’s test is pretty much always significant as it is very sensitive to sample size. Unsurprisingly, then, our Bartlett’s test result is significant.

To run this, we use the cortest.bartlett() function from psych. Note that base R does include a function called bartlett.test(), but this is not the same test! (same Bartlett, though.)

cortest.bartlett(saq)
## R was not square, finding R from data
## $chisq
## [1] 3674.737
## 
## $p.value
## [1] 0
## 
## $df
## [1] 36
  1. Kaiser-Meyer-Olkin (KMO) Test/Kaiser’s Measure of Sampling Adequacy

This test is often referred to as the KMO Test or Kaiser’s MSA, but both respectively mean the same thing. It is a measure of how much variance among all variables might be due to common variance. Higher KMO/MSA values indicate that more variance is likely due to common factors, thus indicating suitability for factor analysis.

Kaiser (1974) provided the following (hilarious) interpretations of MSA values:

MSA values are typically calculated for each variable, and for overall. It helps to report both. The KMO() function in psych will calculate both sets of measures of sampling adequacy. Here are our variables below. Overall they are generally in the meritorious range (except for one):

KMO(saq)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = saq)
## Overall MSA =  0.82
## MSA for each item = 
##  q01  q02  q04  q05  q06  q14  q15  q19  q22 
## 0.81 0.76 0.82 0.84 0.82 0.84 0.85 0.84 0.77