14.4 Wilcoxon signed rank test

14.4.1 Introduction

The non-parametric equivalent to the paired-samples t-test is the Wilcoxon signed-rank test. The sign and rank part of the test’s name comes from how the test statistic is calculated. We won’t deal too much with the mechanics of doing this, but it involves three main steps:

  1. Calculate the difference between condition 1 and condition 2
  2. Rank each difference based on its absolute value (i.e. disregard whether it is positive/negative)
  3. Add up each set of signed differences (i.e. add all the positive differences together, and add all the negative ones together). The test statistic is the minimum of the two.

In essence, the maths is exactly the same as a regular paired-samples t-test (i.e. it is a one-sample test on the differences between groups), but just using ranks this time rather than means. The Wilcoxon signed-rank test can be used to test whether the medians differ between the two conditions (i.e. it’s appropriate to hypothesise this here). Like the other non-parametric tests, it is a test that is free from assumptions about distributions.

14.4.2 Example

In the wages dataset, there are wages between 1980 and 1987. Did the median wage change between these two years? Here are our descriptives:

wages_wide <- wages %>%
  select(nr, year, wage) %>%
  pivot_wider(
    id_cols = nr,
    names_from = year,
    values_from = wage,
    names_prefix = "wage_"
  )

Recall that in a paired-samples t-test, the normality assumption refers to whether the differences between the two conditions are normally distributed. We can test this in the usual two ways: 1) with a normality significance test, and 2) by assessing a Q-Q plot. Here is the former, to show what a non-normal dataset might look like:

shapiro.test(wages_wide$wage_1980 - wages_wide$wage_1987)
## 
##  Shapiro-Wilk normality test
## 
## data:  wages_wide$wage_1980 - wages_wide$wage_1987
## W = 0.88454, p-value < 2.2e-16

As we can see, the test is significant (Shapiro-Wilks’ W = .885, p < .001) - naturally, a tell-tale sign that this data aren’t normally distributed. This would be a good example to use Wilcoxon signed-rank tests over a regular paired t-test.

14.4.3 Output

The setup for a signed-rank test in R again uses the same syntax as the regular t.test() function for a paired test - meaning that we can either give it the two separate columns with paired = TRUE, or use Pairs(a, b) ~ 1 notation. For simplicity we’ll just do the former:

wilcox.test(wages_wide$wage_1980, wages_wide$wage_1987, paired = TRUE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  wages_wide$wage_1980 and wages_wide$wage_1987
## V = 15096, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

As mentioned on the previous page, we can also use our rank_biserial() function the same way to calculate an effect size for this paired test:

rank_biserial(wages_wide$wage_1980, wages_wide$wage_1987, paired = TRUE)

Here is our output. Our test is clearly significant, so we can reject the null and say that wages in 1987 were higher than wages in 1980 (W = 15096, p < .001). Our effect size is also large this time (and negative, indicating that wages were higher in 1987 than 1980).