Poisson Regression
Introduction
Poisson regression models count data with a log link: \(\log E[Y] = \mathbf{X}^\top \boldsymbol{\beta}\). It is appropriate when the variance approximately equals the mean (equidispersion).
Prerequisites
Poisson distribution, generalised linear models.
Theory
Model: \(Y \sim \mathrm{Poisson}(\mu)\), \(\log \mu = \mathbf{X}^\top \boldsymbol{\beta}\).
Coefficients are log-rate ratios: \(\exp(\beta_j)\) is the rate ratio per unit of \(X_j\).
Offset for rates: \(\log E[Y] = \log(\text{exposure}) + \mathbf{X}^\top \boldsymbol{\beta}\). Handles different exposure times or populations.
Assumptions
- Count outcome (non-negative integers).
- Equidispersion: variance = mean.
- Independent observations.
R Implementation
library(broom)
# Simulated: event counts per patient-year
set.seed(2026)
d <- data.frame(
age = rnorm(300, 60, 12),
smoker = rbinom(300, 1, 0.3),
exposure_years = runif(300, 0.5, 5)
)
d$events <- rpois(nrow(d),
lambda = d$exposure_years * exp(-2 + 0.02 * d$age + 0.6 * d$smoker))
fit <- glm(events ~ age + smoker, data = d, family = poisson,
offset = log(exposure_years))
tidy(fit, conf.int = TRUE, exponentiate = TRUE)
glance(fit)
# Check overdispersion
dispersion <- sum(resid(fit, type = "pearson")^2) / fit$df.residual
dispersionOutput & Results
Rate ratios with 95 % CI; dispersion statistic close to 1 if Poisson is appropriate.
Interpretation
“After adjusting for age, smokers had an event rate 1.8 times that of non-smokers (95 % CI 1.4-2.3, p < 0.001).”
Practical Tips
- Check dispersion; if >> 1, use quasi-Poisson or negative binomial.
- Include offset for rates per person-time or area.
- Coefficients are on the log-rate scale; exponentiate for rate ratios.
- Exposure must be on the same time scale as the intended rate.
- For zero-inflation, use zero-inflated Poisson.