Poisson Regression

Regression & Modelling
poisson
count-regression
offset
rate
Log-linear count regression with rates and offsets
Published

April 17, 2026

Introduction

Poisson regression models count data with a log link: \(\log E[Y] = \mathbf{X}^\top \boldsymbol{\beta}\). It is appropriate when the variance approximately equals the mean (equidispersion).

Prerequisites

Poisson distribution, generalised linear models.

Theory

Model: \(Y \sim \mathrm{Poisson}(\mu)\), \(\log \mu = \mathbf{X}^\top \boldsymbol{\beta}\).

Coefficients are log-rate ratios: \(\exp(\beta_j)\) is the rate ratio per unit of \(X_j\).

Offset for rates: \(\log E[Y] = \log(\text{exposure}) + \mathbf{X}^\top \boldsymbol{\beta}\). Handles different exposure times or populations.

Assumptions

  • Count outcome (non-negative integers).
  • Equidispersion: variance = mean.
  • Independent observations.

R Implementation

library(broom)

# Simulated: event counts per patient-year
set.seed(2026)
d <- data.frame(
  age = rnorm(300, 60, 12),
  smoker = rbinom(300, 1, 0.3),
  exposure_years = runif(300, 0.5, 5)
)
d$events <- rpois(nrow(d),
  lambda = d$exposure_years * exp(-2 + 0.02 * d$age + 0.6 * d$smoker))

fit <- glm(events ~ age + smoker, data = d, family = poisson,
           offset = log(exposure_years))
tidy(fit, conf.int = TRUE, exponentiate = TRUE)
glance(fit)

# Check overdispersion
dispersion <- sum(resid(fit, type = "pearson")^2) / fit$df.residual
dispersion

Output & Results

Rate ratios with 95 % CI; dispersion statistic close to 1 if Poisson is appropriate.

Interpretation

“After adjusting for age, smokers had an event rate 1.8 times that of non-smokers (95 % CI 1.4-2.3, p < 0.001).”

Practical Tips

  • Check dispersion; if >> 1, use quasi-Poisson or negative binomial.
  • Include offset for rates per person-time or area.
  • Coefficients are on the log-rate scale; exponentiate for rate ratios.
  • Exposure must be on the same time scale as the intended rate.
  • For zero-inflation, use zero-inflated Poisson.