Zero-Inflated Models
Introduction
When count data have more zeros than a Poisson or NB would predict (structural zeros), zero-inflated models are appropriate. A mixture of two processes: one determining whether the outcome can be non-zero (“susceptible”), the other generating the count.
Prerequisites
Poisson / NB regression, logistic regression.
Theory
\[P(Y = 0) = \pi + (1 - \pi) f(0), \quad P(Y = y) = (1 - \pi) f(y) \text{ for } y > 0,\]
where \(\pi\) is the structural-zero probability and \(f\) is the count distribution (Poisson or NB). Each sub-model has its own linear predictor, typically with a logit link for \(\pi\).
Hurdle models differ slightly: a Bernoulli decides 0 vs. >0, then a truncated count model handles the non-zero part. Hurdle interprets “susceptible or not” differently.
Assumptions
- Count outcome.
- Two distinct processes for zeros and non-zero counts.
- Independence across observations.
R Implementation
library(pscl); library(glmmTMB)
set.seed(2026)
d <- data.frame(x = rnorm(300))
# Simulate with 40% structural zeros
pi_zero <- 0.4
d$y <- ifelse(rbinom(300, 1, pi_zero) == 1, 0,
rpois(300, lambda = exp(0.5 + 0.6 * d$x)))
# Zero-inflated Poisson
fit_zip <- zeroinfl(y ~ x | x, data = d)
summary(fit_zip)
# Zero-inflated negative binomial
fit_zinb <- zeroinfl(y ~ x | x, data = d, dist = "negbin")
AIC(fit_zip, fit_zinb)
# glmmTMB version
fit_tmb <- glmmTMB(y ~ x, ziformula = ~ x, data = d, family = poisson)Output & Results
Two sets of coefficients (count part and zero-inflation part). AIC comparison between ZIP and ZINB.
Interpretation
“In a ZINB model, the count component showed rate ratio 1.75 (95 % CI 1.4-2.2) per unit x; the zero-inflation component showed that a 1-unit increase in x reduced the odds of being a structural zero by 45 %.”
Practical Tips
- Vuong test can compare zero-inflated to regular Poisson/NB; often not decisive.
- Separate predictors can be included in the zero-inflation and count models.
- Hurdle models (
pscl::hurdle) are an alternative; interpret sub-models differently. - Overdispersed counts plus excess zeros typically warrant ZINB.
- Interpretation is cleaner in hurdle models: “probability of having any vs. count given any”.