One-Way ANOVA

anova

one-way

tukey

welch

levene

Comparing means across three or more independent groups, with Welch’s and Brown-Forsythe adjustments and Tukey post-hoc tests

Published

April 17, 2026

Research question

One-way ANOVA tests whether the means of a continuous outcome differ across three or more independent groups. Biomedical examples: (1) do mean HbA1c values differ across four antidiabetic regimens?; (2) does mean tumour volume after 14 days differ across three doses of an experimental compound in a xenograft model?

Assumptions

Assumption	How to verify in R
Independent observations across and within groups	design
Outcome approximately normal within each group (or large \(n\))	`shapiro_test()` by group, Q-Q plots
Homogeneity of variances	`leveneTest()` or `rstatix::levene_test()`
No extreme outliers	boxplot per group

When variance homogeneity fails, use Welch’s ANOVA (oneway.test(var.equal = FALSE)) or the Brown-Forsythe F-test. When normality fails, use the Kruskal-Wallis test.

Hypotheses

\[H_0: \mu_1 = \mu_2 = \ldots = \mu_k \qquad H_1: \text{at least one } \mu_i \text{ differs}\]

R code

library(tidyverse); library(rstatix); library(car); library(effectsize); library(ggstatsplot)
set.seed(42)

# 40 patients per arm across four regimens; HbA1c at 24 weeks
hba1c <- tibble(
  regimen = factor(rep(c("Metformin", "SGLT2i", "GLP1-RA", "Combo"),
                        each = 40),
                   levels = c("Metformin", "SGLT2i", "GLP1-RA", "Combo")),
  hba1c   = c(rnorm(40, 7.4, 0.7), rnorm(40, 7.0, 0.7),
              rnorm(40, 6.8, 0.7), rnorm(40, 6.5, 0.7))
)

# Assumption checks
hba1c |> group_by(regimen) |> shapiro_test(hba1c)
leveneTest(hba1c ~ regimen, data = hba1c)

# Standard one-way ANOVA
aov_res <- hba1c |> anova_test(hba1c ~ regimen, detailed = TRUE)
aov_res

# Welch adjustment (used when variances differ)
oneway.test(hba1c ~ regimen, data = hba1c, var.equal = FALSE)

# Effect size: eta-squared and omega-squared
effectsize::eta_squared(aov(hba1c ~ regimen, data = hba1c))
effectsize::omega_squared(aov(hba1c ~ regimen, data = hba1c))

# Tukey HSD post-hoc
hba1c |> tukey_hsd(hba1c ~ regimen)

# Visualisation with inline stats
ggbetweenstats(data = hba1c, x = regimen, y = hba1c,
               pairwise.display = "significant",
               xlab = "Regimen", ylab = "HbA1c (%)")

Interpreting the output

The omnibus \(F(3, 156) \approx 15.3\), \(p < .001\) rejects the null of equal means. \(\eta^2 \approx 0.23\) is a large effect by Cohen’s convention. Tukey HSD identifies which pairs differ after controlling the family-wise error rate at 0.05; in the example, every pairwise difference except Metformin vs. SGLT2i reaches significance.

Effect size

Measure	Formula	Small	Medium	Large
Eta-squared \(\eta^2\)	\(SS_\text{between} / SS_\text{total}\)	0.01	0.06	0.14
Omega-squared \(\omega^2\)	unbiased variant of \(\eta^2\)	0.01	0.06	0.14
Cohen’s \(f\)	\(\sqrt{\eta^2 / (1 - \eta^2)}\)	0.10	0.25	0.40

\(\omega^2\) is preferred in small samples because \(\eta^2\) is upwardly biased.

Reporting (APA 7)

HbA1c at 24 weeks differed significantly across the four regimens, F(3, 156) = 15.28, p < .001, omega-squared = .21. Tukey HSD post-hoc tests indicated that the Combo arm was lower than all others, and the Metformin arm was higher than SGLT2i, GLP1-RA, and Combo (all adjusted p < .05).

Common pitfalls

Running multiple t-tests instead of an omnibus ANOVA inflates the family-wise error rate.
Interpreting a significant omnibus result as “all groups differ”; only post-hoc tests identify specific pairs.
Ignoring the Type I / II / III sum-of-squares distinction; in balanced designs all three agree, but in unbalanced designs they differ. rstatix uses Type II by default; car::Anova() lets you choose.
Reporting p-values without effect sizes.

Parametric vs. non-parametric alternative

Non-parametric alternative: Kruskal-Wallis test.
For within-subjects designs: one-way repeated-measures ANOVA.
For two groups: independent-samples t-test.