Simulating Survival Data
Introduction
Simulating survival data is essential for power calculations, method comparisons, and teaching. The inverse hazard method generates event times from any specified hazard function.
Prerequisites
Inverse CDF method, hazard function.
Theory
For hazard \(h(t | x)\) with cumulative \(H(t | x)\), if \(U \sim \mathrm{Uniform}(0, 1)\), then \(T = H^{-1}(-\log U | x)\) has the desired distribution.
For proportional hazards: \(T_i = H_0^{-1}(-\log U_i \exp(-x_i^\top \beta))\).
Censoring: simulate independent censoring time \(C_i\), take \(\min(T_i, C_i)\) as observed, event indicator = 1 if \(T_i \leq C_i\).
Assumptions
Specified hazard form is correct.
R Implementation
library(simsurv)
set.seed(2026)
n <- 300
x <- data.frame(id = 1:n, arm = factor(rep(c("A", "B"), each = n/2)))
x$arm_num <- as.numeric(x$arm) - 1
# Weibull baseline: lambda = 0.1, gamma = 1.2
sim <- simsurv(lambdas = 0.1, gammas = 1.2,
x = x, betas = c(arm_num = -0.5),
maxt = 10)
# Merge with covariates
d <- merge(sim, x, by = "id")
head(d)
# Kaplan-Meier
library(survival)
fit_km <- survfit(Surv(eventtime, status) ~ arm, data = d)
plot(fit_km)Output & Results
Simulated dataset with event times, status, covariates; KM curves reflecting the simulated effect.
Interpretation
“Simulation confirmed the designed HR = exp(-0.5) = 0.61 between arms; Cox on simulated data recovered it accurately.”
Practical Tips
simsurvsupports Weibull, Gompertz, and custom hazards.flexsurv::rsurvgenerates from fitted flexible models.- For competing risks: simulate each cause separately, take the earliest.
- Always add censoring to match the real-study distribution.
- Monte Carlo simulation is the gold standard for power and calibration evaluation.