Power for the Log-Rank Test

Sample Size & Power

power

log-rank

events

accrual

Event-based sample size for Kaplan-Meier comparisons via the log-rank test

Published

April 17, 2026

Introduction

The log-rank test is the non-parametric test of equality of survival functions. Its power depends on the number of observed events in the two groups combined, not on the sample size per se. Event-based planning separates accrual and follow-up from sample size.

Prerequisites

Log-rank test, Kaplan-Meier estimator.

Theory

Under \(H_1\): HR = \(\theta\) and equal allocation, the required total events is

\[D = \frac{4 (z_{1-\alpha/2} + z_{1-\beta})^2}{\log^2 \theta}.\]

For unequal allocation with fractions \(\pi_1, \pi_2\) (summing to 1), replace the 4 with \(1/(\pi_1 \pi_2)\).

Required sample size depends on the event probability during the study, which depends on event rates, accrual time, and follow-up time.

Assumptions

Proportional hazards (approximately).
Independent, non-informative censoring.
Pre-specified accrual rate and follow-up period.

R Implementation

library(powerSurvEpi); library(gsDesign)

# Equal allocation, detect HR = 0.65 with 80 % power
D <- 4 * (qnorm(0.975) + qnorm(0.80))^2 / log(0.65)^2
D

# Sample size given event rates and follow-up
ssizeCT.default(power = 0.80, k = 1,
                pE = 0.40, pC = 0.55,
                RR = 0.65, alpha = 0.05)

# Group-sequential log-rank plan
gs <- gsDesign(k = 3, test.type = 2, alpha = 0.025, beta = 0.20)
gsBoundSummary(gs)

Output & Results

To detect HR = 0.65 with 80 % power, about 88 events are required. With event rates of 40 %/55 %, this corresponds to roughly 185 subjects.

Interpretation

“The study requires 88 events to detect a hazard ratio of 0.65 with 80 % power at \(\alpha = 0.05\). Assuming 24 months of accrual and 12 months of follow-up at the expected event rate, we plan to enrol 180 subjects.”

Practical Tips

Power depends on events, not subjects; plan accrual to deliver the required event count.
Non-proportional hazards reduce log-rank power; Fleming-Harrington weighted tests improve early- or late-difference detection.
For interim monitoring, use alpha spending and information-based analysis timing.
Report the assumed accrual pattern and expected follow-up; they determine the event yield.
For competing risks, use Fine-Gray test power calculations instead.