Power for McNemar’s Test

Sample Size & Power
power
mcnemar
paired-binary
discordant
Sample size for paired binary comparisons driven by the rate of discordant pairs
Published

April 17, 2026

Introduction

Power for McNemar’s test depends on the rate of discordant pairs, not on the overall sample size alone. The concordant pairs contribute nothing to the test, so studies with high agreement need proportionally more pairs.

Prerequisites

McNemar’s test, paired binary data.

Theory

Let \(p_{10}\) = probability of (+, -) pairs and \(p_{01}\) = probability of (-, +). The null hypothesis is \(p_{10} = p_{01}\). Under \(H_1\), the total proportion of discordant pairs is \(p_{\text{disc}} = p_{10} + p_{01}\); the odds ratio of a “+ on test 1” given discordance is \(p_{10}/p_{01}\).

For a two-sided test at \(\alpha\) and power \(1 - \beta\):

\[n \approx \frac{(z_{1-\alpha/2} + z_{1-\beta})^2}{p_{\text{disc}} \cdot \left(\frac{p_{10} - p_{01}}{p_{10} + p_{01}}\right)^2}.\]

As discordance drops, \(n\) grows rapidly.

Assumptions

  • Independent paired observations.
  • Pre-specified \(p_{10}, p_{01}\) from pilot.

R Implementation

library(pwrss)

# Expected p10 = 0.15, p01 = 0.05, alpha = 0.05, power = 0.80
pwrss.z.mcnemar(p10 = 0.15, p01 = 0.05,
                alpha = 0.05, power = 0.80)

# Manual calculation
p10 <- 0.15; p01 <- 0.05
p_disc <- p10 + p01
OR <- (p10 - p01) / (p10 + p01)
n_manual <- (qnorm(0.975) + qnorm(0.80))^2 / (p_disc * OR^2)
n_manual

Output & Results

\(n \approx 79\) pairs required. If discordance is lower (say \(p_{10} = 0.10\), \(p_{01} = 0.05\)), required \(n\) roughly doubles.

Interpretation

“With an expected proportion of (+, -) pairs of 0.15 and (-, +) pairs of 0.05, McNemar’s test requires 79 paired observations for 80 % power at two-sided \(\alpha = 0.05\).”

Practical Tips

  • Plan for the total sample, not just discordant pairs; concordant pairs are expected but non-informative.
  • The formula is sensitive to the assumed discordance rates; sensitivity analysis is essential.
  • For very high agreement (concordant pairs >> discordant), large samples are needed; consider redesigning the comparison.
  • Exact McNemar is more conservative in small samples; simulate if exactness matters.
  • Extension to Bowker or Stuart-Maxwell for multi-category paired data requires simulation-based power.