12.1 Introduction to EFA

We generally conduct a survey because we’re interested in how different people hold different attitudes, ideas or beliefs about things. All of the standard statistical tools that you learned in Modules 5 and 6 are useful in this regard, and can absolutely be used with survey data. But given how highly dimensional survey data is, it opens up a new way of analysing data - exploratory factor analysis.

12.1.1 Dimension reduction

Sometimes it’s easy to forget that designing a questionnaire and administering it gives rise to quite complex data. After all, a single question item may simply ask participants to rate themselves on a statement using a Likert scale, as we have seen earlier in this module. However, consider how many questions we might collect across a scale and it quickly becomes evident that this kind of data is highly dimensional - that is, with lots of individual questions we end up with a lot of data to sift through.

This kind of scenario is where dimension reduction techniques become extremely useful. Dimension reduction techniques allow us to essentially collapse data into ‘supervariables’ that can simplify the analyses that we do by capturing the commonalities across questionnaire items.

Principal components analysis (PCA) is the most common form of dimension reduction. Principal components analysis lets us take highly dimensional data, such as a questionnaire/scale with multiple items, and collapse that down into a smaller number of components.

12.1.2 Factor analysis

Factor analysis is another analytical technique like dimension reduction. However, the key conceptual difference is that while PCA lets us collapse multiple variables into a smaller number of components, factor analysis lets us identify latent factors in our data. Latent factors are the factors underlying the behaviours and responses that we observe in our questionnaire items. We might find, for example, that performance on a series of tests is actually underlaid by multiple distinct, theoretically meaningful factors.

Therefore, factor analysis lets us build and test theories about latent psychological constructs. Factor analysis allows us to indirectly measure these latent factors - essentially, are our questions tapping into the same ‘thing’?

Factor analysis can be split into exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). We will focus specifically on EFA in this module.

A terminology note

You’ll note that we’ve described PCA as generating components, while FA generates factors. It’s worth remembering that this is intentional, and the two terms should not be used interchangeably. We will talk more about this on the pages that follow, but components are simply linear combinations of multiple variables. Factors are estimates of latent variables that drive behaviour. The latter specifically is what we use to test theories about psychological constructs.

12.1.3 The steps of exploratory factor analysis

EFA is quite an involved analysis, and there are several considerations that must be taken into account:

  • Prepare data and assess for suitability
  • Decide on the extraction method
  • Decide on how many factors to retain
  • Decide on the rotation method
  • Interpret the results

12.1.4 Example dataset

The example we’ll be using to work through this data is from a brilliant statistician and educator, Professor Andy Field, who is very highly regarded for his Discovering Statistics series - including Discovering Statistics with SPSS and Discovering Statistics with R. I highly recommend checking them out if you plan on using them!

As part of his book, Prof. Field came up with a questionnaire called the SPSS Anxiety Questionnaire (SAQ). For the purposes of the next few pages we’ll be using a reduced version with just 9 questions, which we’ll call the SAQ-9. The questions in this survey are:

Q1: Statistics makes me cry Q2: My friends will think I’m stupid for not being able to cope with SPSS Q4: I dream that Pearson is attacking me with correlation coefficients Q5: I don’t understand statistics Q6: I have little experience of computers Q14: Computers have minds of their own and deliberately go wrong whenever I use them Q15: Computers are out to get me Q19: Everybody looks at me when I use SPSS Q22: My friends are better at SPSS than I am

saq <- read_csv(here("data", "efa", "SAQ-9.csv"))
## Rows: 2571 Columns: 9
## ── Column specification ────────────────────────
## Delimiter: ","
## dbl (9): q01, q02, q04, q05, q06, q14, q15, q19, q22
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

As we walk through the content, we will use this dataset to illustrate how to conduct an exploratory factor analysis. (Note the questions were specifically chosen for demonstration purposes.)

To actually conduct the EFA, we will primarily rely on two packages: psych and EFA.dimensions. The psych package is a fairly big package designed to run many common analyses in psychological science, specifically analyses that relate to psychometrics. It’s an incredibly useful package to be aware of in general. EFA.dimensions is another great package that provides functions to help with certain parts of the EFA process.