Power for McNemar’s Test

Sample Size & Power

power

mcnemar

paired-binary

discordant

Sample size for paired binary comparisons driven by the rate of discordant pairs

Published

April 17, 2026

Introduction

Power for McNemar’s test depends on the rate of discordant pairs, not on the overall sample size alone. The concordant pairs contribute nothing to the test, so studies with high agreement need proportionally more pairs.

Prerequisites

McNemar’s test, paired binary data.

Theory

Let \(p_{10}\) = probability of (+, -) pairs and \(p_{01}\) = probability of (-, +). The null hypothesis is \(p_{10} = p_{01}\). Under \(H_1\), the total proportion of discordant pairs is \(p_{\text{disc}} = p_{10} + p_{01}\); the odds ratio of a “+ on test 1” given discordance is \(p_{10}/p_{01}\).

For a two-sided test at \(\alpha\) and power \(1 - \beta\):

\[n \approx \frac{(z_{1-\alpha/2} + z_{1-\beta})^2}{p_{\text{disc}} \cdot \left(\frac{p_{10} - p_{01}}{p_{10} + p_{01}}\right)^2}.\]

As discordance drops, \(n\) grows rapidly.

Assumptions

Independent paired observations.
Pre-specified \(p_{10}, p_{01}\) from pilot.

R Implementation

library(pwrss)

# Expected p10 = 0.15, p01 = 0.05, alpha = 0.05, power = 0.80
pwrss.z.mcnemar(p10 = 0.15, p01 = 0.05,
                alpha = 0.05, power = 0.80)

# Manual calculation
p10 <- 0.15; p01 <- 0.05
p_disc <- p10 + p01
OR <- (p10 - p01) / (p10 + p01)
n_manual <- (qnorm(0.975) + qnorm(0.80))^2 / (p_disc * OR^2)
n_manual

Output & Results

\(n \approx 79\) pairs required. If discordance is lower (say \(p_{10} = 0.10\), \(p_{01} = 0.05\)), required \(n\) roughly doubles.

Interpretation

“With an expected proportion of (+, -) pairs of 0.15 and (-, +) pairs of 0.05, McNemar’s test requires 79 paired observations for 80 % power at two-sided \(\alpha = 0.05\).”

Practical Tips

Plan for the total sample, not just discordant pairs; concordant pairs are expected but non-informative.
The formula is sensitive to the assumed discordance rates; sensitivity analysis is essential.
For very high agreement (concordant pairs >> discordant), large samples are needed; consider redesigning the comparison.
Exact McNemar is more conservative in small samples; simulate if exactness matters.
Extension to Bowker or Stuart-Maxwell for multi-category paired data requires simulation-based power.