Wilcoxon Signed-Rank Test

wilcoxon-signed-rank

paired

non-parametric

ranks

Non-parametric comparison of paired measurements on an ordinal or non-normal continuous outcome

Published

April 17, 2026

Research question

The Wilcoxon signed-rank test is the non-parametric counterpart of the paired t-test. Use it when two dependent measurements per unit are compared and the differences are non-normal or ordinal. Biomedical examples: (1) do patient-rated sleep-quality scores (5-point Likert) improve from baseline to week 8 of cognitive-behavioural therapy for insomnia?; (2) do log-transformed viral loads differ between two time points when the paired differences remain skewed?

Assumptions

Assumption	How to verify in R
Paired observations per unit	design
Differences symmetric around their median (strict interpretation)	histogram of differences
Differences at least ordinal	scale check

If differences are badly asymmetric, a sign test gives a weaker but assumption-light alternative.

Hypotheses

Let \(D_i = X_i - Y_i\). Under the symmetry assumption,

\[H_0: \text{median}(D) = 0 \qquad H_1: \text{median}(D) \ne 0\]

R code

library(tidyverse); library(rstatix); library(effectsize); library(ggstatsplot)
set.seed(42)

# 32 patients; sleep quality (0-20) pre and post CBT-I
sleep <- tibble(
  id   = 1:32,
  pre  = sample(3:14, 32, replace = TRUE),
  post = NA_integer_
) |>
  mutate(post = pmin(20L, pmax(0L, pre + sample(-1:7, 32, replace = TRUE))))

long <- sleep |> pivot_longer(c(pre, post), names_to = "time", values_to = "score") |>
  mutate(time = factor(time, levels = c("pre", "post")))

diffs <- sleep$post - sleep$pre
hist(diffs, main = "Differences: post - pre", xlab = "Change in score")

# Wilcoxon signed-rank test
long |> wilcox_test(score ~ time, paired = TRUE, detailed = TRUE)

# Effect size
long |> wilcox_effsize(score ~ time, paired = TRUE)

# Visualisation
ggwithinstats(data = long, x = time, y = score, type = "nonparametric",
              xlab = "Time", ylab = "Sleep-quality score")

Interpreting the output

With \(V\) = 456 and \(p < .001\), the median difference is significantly non-zero. The rank-biserial correlation for paired data \(r \approx 0.76\) indicates a large change. The boxplot of long shows the post distribution shifted upward and slightly less skewed than pre.

Effect size

For paired ranks, the rank-biserial correlation is computed from the ratio of positive to total rank sums. Cohen’s thresholds (adapted): small 0.10, medium 0.30, large 0.50.

Reporting (APA 7)

Sleep-quality scores improved significantly from baseline to week 8 (Wilcoxon signed-rank V = 456, p < .001, rank-biserial r = .76). Median scores rose from 7.5 to 13.0.

Common pitfalls

Including pairs with zero differences: R’s default drops them, which is appropriate for Wilcoxon; the sign test keeps them under a different convention.
Reporting mean differences when the median is the correct summary given skew.
Using the rank-sum (Mann-Whitney) call on paired data by forgetting paired = TRUE.

Parametric vs. non-parametric alternative

Parametric counterpart: paired t-test.
Even more assumption-light: sign test.
For three or more repeated measures: Friedman test.