Simulating Survival Data

Survival Analysis
simulation
inverse-hazard
Generating realistic censored time-to-event datasets for power analysis and method validation
Published

April 17, 2026

Introduction

Simulating survival data is essential for power calculations, method comparisons, and teaching. The inverse hazard method generates event times from any specified hazard function.

Prerequisites

Inverse CDF method, hazard function.

Theory

For hazard \(h(t | x)\) with cumulative \(H(t | x)\), if \(U \sim \mathrm{Uniform}(0, 1)\), then \(T = H^{-1}(-\log U | x)\) has the desired distribution.

For proportional hazards: \(T_i = H_0^{-1}(-\log U_i \exp(-x_i^\top \beta))\).

Censoring: simulate independent censoring time \(C_i\), take \(\min(T_i, C_i)\) as observed, event indicator = 1 if \(T_i \leq C_i\).

Assumptions

Specified hazard form is correct.

R Implementation

library(simsurv)

set.seed(2026)
n <- 300
x <- data.frame(id = 1:n, arm = factor(rep(c("A", "B"), each = n/2)))
x$arm_num <- as.numeric(x$arm) - 1

# Weibull baseline: lambda = 0.1, gamma = 1.2
sim <- simsurv(lambdas = 0.1, gammas = 1.2,
               x = x, betas = c(arm_num = -0.5),
               maxt = 10)

# Merge with covariates
d <- merge(sim, x, by = "id")
head(d)

# Kaplan-Meier
library(survival)
fit_km <- survfit(Surv(eventtime, status) ~ arm, data = d)
plot(fit_km)

Output & Results

Simulated dataset with event times, status, covariates; KM curves reflecting the simulated effect.

Interpretation

“Simulation confirmed the designed HR = exp(-0.5) = 0.61 between arms; Cox on simulated data recovered it accurately.”

Practical Tips

  • simsurv supports Weibull, Gompertz, and custom hazards.
  • flexsurv::rsurv generates from fitted flexible models.
  • For competing risks: simulate each cause separately, take the earliest.
  • Always add censoring to match the real-study distribution.
  • Monte Carlo simulation is the gold standard for power and calibration evaluation.