Friedman Test

friedman
non-parametric
repeated-measures
ranks
Non-parametric repeated-measures comparison across three or more dependent measurements
Published

April 17, 2026

Research question

The Friedman test is the non-parametric counterpart of a one-way repeated-measures ANOVA. Use it when each subject provides three or more measurements and the outcome is ordinal or strongly non-normal. Biomedical example: do patient-reported Dyspnoea scores (Borg scale) change across four time points after initiating a new inhaled bronchodilator in a COPD cohort?

Assumptions

Assumption How to verify in R
One group of subjects, each measured at several occasions design
Outcome at least ordinal scale level
No assumption of normality or sphericity

The test converts each subject’s measurements into within-subject ranks, then tests whether the average rank per occasion differs.

Hypotheses

\[H_0: \text{distributions at all occasions are the same} \qquad H_1: \text{at least one differs}\]

R code

library(tidyverse); library(rstatix); library(effectsize); library(ggstatsplot)
set.seed(42)

# 28 patients; Borg dyspnoea scale at baseline, week 2, week 4, week 8
borg <- expand_grid(id = 1:28, time = factor(c("b", "w2", "w4", "w8"),
                                             levels = c("b", "w2", "w4", "w8"))) |>
  mutate(score = case_when(
    time == "b"  ~ sample(5:8, n(), replace = TRUE),
    time == "w2" ~ sample(4:7, n(), replace = TRUE),
    time == "w4" ~ sample(3:6, n(), replace = TRUE),
    time == "w8" ~ sample(2:5, n(), replace = TRUE)
  ))

# Friedman test (requires complete cases per id)
borg |> friedman_test(score ~ time | id)

# Effect size: Kendall's W (coefficient of concordance)
borg |> friedman_effsize(score ~ time | id)

# Post-hoc pairwise Wilcoxon with Bonferroni correction
borg |> wilcox_test(score ~ time, paired = TRUE, p.adjust.method = "bonferroni")

ggwithinstats(data = borg, x = time, y = score, type = "nonparametric",
              xlab = "Time", ylab = "Borg dyspnoea score")

Interpreting the output

With Friedman \(\chi^2(3) \approx 65\), \(p < .001\) and Kendall’s \(W \approx 0.77\), the null is rejected and the effect is large. Pairwise Bonferroni-adjusted Wilcoxon tests localise the differences; each time point differs from baseline and from week 2.

Effect size

Kendall’s W ranges from 0 (no agreement) to 1 (perfect agreement on ranks). Cohen’s thresholds (adapted): small 0.10, medium 0.30, large 0.50.

Reporting (APA 7)

Borg dyspnoea scores differed across the four assessment times (Friedman chi-squared(3) = 65.4, p < .001, Kendall’s W = .77). Bonferroni-adjusted pairwise Wilcoxon signed-rank tests showed that each subsequent time point was lower than baseline and week 2 (all adjusted p < .05).

Common pitfalls

  • Missing data: Friedman requires each subject to have a value at every time. Drop incomplete cases or switch to a mixed model.
  • Reporting the mean per time point; medians and IQRs are appropriate.
  • Ties inside subjects are averaged into fractional ranks automatically; this is rarely a concern but may reduce power slightly.

Parametric vs. non-parametric alternative

Further reading

  • Sheskin, D. J. (2020). Handbook of Parametric and Nonparametric Statistical Procedures (5th ed.).

Structure inspired by the University of Zurich Methodenberatung (methodenberatung.uzh.ch). All text, examples, R code, and reporting sentences are independently authored in English.