Zero-Inflated Models

Regression & Modelling
zero-inflation
zip
zinb
mixture
Mixture models for count data with more zeros than Poisson or negative binomial predicts
Published

April 17, 2026

Introduction

When count data have more zeros than a Poisson or NB would predict (structural zeros), zero-inflated models are appropriate. A mixture of two processes: one determining whether the outcome can be non-zero (“susceptible”), the other generating the count.

Prerequisites

Poisson / NB regression, logistic regression.

Theory

\[P(Y = 0) = \pi + (1 - \pi) f(0), \quad P(Y = y) = (1 - \pi) f(y) \text{ for } y > 0,\]

where \(\pi\) is the structural-zero probability and \(f\) is the count distribution (Poisson or NB). Each sub-model has its own linear predictor, typically with a logit link for \(\pi\).

Hurdle models differ slightly: a Bernoulli decides 0 vs. >0, then a truncated count model handles the non-zero part. Hurdle interprets “susceptible or not” differently.

Assumptions

  • Count outcome.
  • Two distinct processes for zeros and non-zero counts.
  • Independence across observations.

R Implementation

library(pscl); library(glmmTMB)

set.seed(2026)
d <- data.frame(x = rnorm(300))
# Simulate with 40% structural zeros
pi_zero <- 0.4
d$y <- ifelse(rbinom(300, 1, pi_zero) == 1, 0,
              rpois(300, lambda = exp(0.5 + 0.6 * d$x)))

# Zero-inflated Poisson
fit_zip <- zeroinfl(y ~ x | x, data = d)
summary(fit_zip)

# Zero-inflated negative binomial
fit_zinb <- zeroinfl(y ~ x | x, data = d, dist = "negbin")

AIC(fit_zip, fit_zinb)

# glmmTMB version
fit_tmb <- glmmTMB(y ~ x, ziformula = ~ x, data = d, family = poisson)

Output & Results

Two sets of coefficients (count part and zero-inflation part). AIC comparison between ZIP and ZINB.

Interpretation

“In a ZINB model, the count component showed rate ratio 1.75 (95 % CI 1.4-2.2) per unit x; the zero-inflation component showed that a 1-unit increase in x reduced the odds of being a structural zero by 45 %.”

Practical Tips

  • Vuong test can compare zero-inflated to regular Poisson/NB; often not decisive.
  • Separate predictors can be included in the zero-inflation and count models.
  • Hurdle models (pscl::hurdle) are an alternative; interpret sub-models differently.
  • Overdispersed counts plus excess zeros typically warrant ZINB.
  • Interpretation is cleaner in hurdle models: “probability of having any vs. count given any”.