6.5 Paired-samples t-test

Here’s our last test for the module, and it is again another bread-and-butter statistical test in literature: the paired-samples t-test.

6.5.1 Paired-samples

Paired samples t-tests, as the name sort of implies, are used when we have a sample and we take measurements twice. Often, paired-samples t-tests are interested in testing the effect of time on an outcome; for example, a before-after design lends itself quite nicely to paired-samples and other repeated-measures tests.

The core hypotheses are very much the same here, aside from the caveat that the means are between conditions and not groups.

Mathematically, the paired-samples t-test is actually just a variant of the one-sample t-test. If we did a one-sample t-test on the differences between the two timepoints/conditions, we would get the same results. You can see a demonstration of this in the dropdown at the end of this section.

6.5.2 Example data

For this example, we’ll take a look at a simple interventions study. Participants were asked to answer a short list of questions relating to how they were feeling, once before an intervention and once afterwards. Higher scores represent better emotional states. The intervention was a series of self-regulation classes and exercises that the participants took twice a week. We’re interested in seeing whether the intervention was effective.

w8_symptoms <- read_csv(here("data", "week_8", "W8_symptoms.csv"))

## Rows: 32 Columns: 2
## ── Column specification ────────────────────────
## Delimiter: ","
## dbl (2): before, after
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(w8_symptoms)

R-note: Data for paired-samples t-tests can either be in wide form or long form, and this depends on the specific function used. Our dataset is already in wide form as we have one column for before and one for after, so the following code will create a long-form version:

# Pivot to long format
w8_symptoms_long <- w8_symptoms %>%
  pivot_longer(
    cols = everything(),
    names_to = "time",
    values_to = "symptom_score"
  )

# Display start of new data
head(w8_symptoms_long)

6.5.3 Assumption checks

Similar to other tests, we need to check normality. Here, the assumption is whether the differences between time 1 and 2 are normally distributed (not necessarily time 1 and 2 themselves). Hence, when we run a Shapiro-Wilks test we’re running this on the values we get from Time 1 - Time 2.

In our data, this assumption seems to be intact (W .985, p = .918).

shapiro.test(w8_symptoms$before - w8_symptoms$after)

## 
##  Shapiro-Wilk normality test
## 
## data:  w8_symptoms$before - w8_symptoms$after
## W = 0.98465, p-value = 0.9177

6.5.4 Output

In R, you can do pairwise t-tests using the default t.test() (or the rstatix equivalent t_test()) function. In both methods, you must set paired = TRUE in order to run a paired t-test. However, there is one key difference: the base t.test() requires data in wide format, whereas rstatix::t_test() requires data in long format. Here are both methods.

For the base t.test() function, your data needs to be in wide format. From there it is as simple as giving the two columns as the arguments to the function, and setting paired = TRUE:

t.test(x = w8_symptoms$before, y = w8_symptoms$after, paired = TRUE)

## 
##  Paired t-test
## 
## data:  w8_symptoms$before and w8_symptoms$after
## t = -2.9501, df = 31, p-value = 0.005999
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -4.651185 -0.848815
## sample estimates:
## mean difference 
##           -2.75

Alternatively, you can use the below notation. Pair(before, after) indicates that we want R to treat the before and after variables as paired data. The ~ 1, as we saw before, indicates that this is a one-sample t-test. This is because a paired-samples t-test is functionally equivalent to a one-sample t-test on the differences between conditions (see the expandable dropdown below for more details).

t.test(Pair(before, after) ~ 1, data = w8_symptoms)

## 
##  Paired t-test
## 
## data:  Pair(before, after)
## t = -2.9501, df = 31, p-value = 0.005999
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -4.651185 -0.848815
## sample estimates:
## mean difference 
##           -2.75

Note that before R 4.4.0 (2024), the notation for paired-samples t-tests were different.

If you prefer rstatix, on the other hand, your data needs to be in long format. After that, you can pass the test as a formula much like how you would do so in the independent samples case:

w8_symptoms_long %>%
  t_test(symptom_score ~ time, paired = TRUE, detailed = TRUE)

Once again, we’ll also want to calculate Cohen’s d for our paired-samples t-test. For a paired-samples t-test the developers of effectsize recommend using the rm_d() function, which stands for repeated-measures Cohen’s d. This function takes the same basic syntax as t.test() for paired-samples t-tests, including the use of wide data, but a couple of extra arguments are required.

method = "z" defines how the pooled SD term is calculated. There are six possible options, but this option gives the standard calculation.
adjust = FALSE: calculates Hedges’ G, which is an alternative effect size that corrects for small sample bias. We won’t worry about this here, so we will set this to FALSE.

Note that like the other functions, you can either provide the arguments in Pairs() ~ 1 or by subsetting the columns (x = x$1, y = x$2).

# Paired sample - works with wide data
# t.test(Pair(before, after) ~ 1, data = w8_symptoms)
effectsize::rm_d(Pair(before, after) ~ 1, data = w8_symptoms, method = "z", adjust = FALSE)

# t.test(x = w8_symptoms$before, y = w8_symptoms$after, paired = TRUE)
effectsize::rm_d(x = w8_symptoms$before, y = w8_symptoms$after, method = "z", adjust = FALSE)

If using cohen_d() from rstatix, paired = TRUE must be selected:

w8_symptoms_long %>%
  rstatix::cohens_d(symptom_score ~ time, paired = TRUE, ci = TRUE, ci.type = "norm")

The Canvas version asks you to interpret each of these effect sizes, but… rstatix will automatically label this for you!

The mean symptom scores of the two timepoints are significantly different (t(31) = 2.95, p = .006). Based on the means and mean difference (2.75; 95% CI = [0.85, 4.65]), participants reported having significantly better emotional states after the intervention compared to beforehand. We can write that up as below, and in this specific example we will condense the text down:

A paired samples t-test found that symptoms significantly decreased after the intervention (t(31) = 2.95, p = .006), with an average decrease of 2.75 points (95% CI [0.85, 4.65]). This decrease was medium in size (d = 0.52).

Paired samples using the one-sample t-test

Mathematically, all a paired-samples t-test is doing is running a one-sample t-test on the differences between the two timepoints/groups, Below is a demonstration of how paired-samples tests can be run using the one-sample t-test.

Let’s start by returning to the wide version of our dataset. We first need to calculate the difference between the before and after columns, which we can easily do with mutate().

w8_symptoms <- w8_symptoms %>%
  mutate(
    diff = before - after
  )

w8_symptoms

We can run a one-sample t-test on the differences now, with the null hypothesis value being 0 - i.e. we are testing the null hypothesis that the mean differences are not significantly different from 0.

t.test(w8_symptoms$diff ~ 1, mu = 0)

## 
##  One Sample t-test
## 
## data:  w8_symptoms$diff
## t = -2.9501, df = 31, p-value = 0.005999
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -4.651185 -0.848815
## sample estimates:
## mean of x 
##     -2.75

As we can see, the results are equivalent to the output of the paired-samples t-test above. Note too that our Shapiro-Wilks test will also give equivalent results if it is run on the differences directly.