The Multinomial Distribution

Probability Theory

multinomial

categorical

multi-class

Multi-category generalisation of the binomial with category probabilities summing to one

Published

April 17, 2026

Introduction

The multinomial distribution generalises the binomial from two categories to \(k\): the counts of outcomes falling into each of several categories across \(n\) independent trials. It underlies the analysis of categorical variables with more than two levels – disease stages, treatment responses, species counts – and sits at the heart of chi-squared goodness-of-fit tests.

Prerequisites

Binomial distribution.

Theory

Let \((X_1, \ldots, X_k)\) count outcomes across \(k\) categories in \(n\) independent trials, where the probability of outcome \(j\) on any trial is \(p_j\), and \(\sum p_j = 1\).

PMF:

\[P(X_1 = x_1, \ldots, X_k = x_k) = \frac{n!}{x_1! \cdots x_k!} \prod_{j=1}^k p_j^{x_j},\]

for \(x_j \geq 0\) and \(\sum x_j = n\).

Moments:

\(E[X_j] = n p_j\).
\(\mathrm{Var}(X_j) = n p_j (1 - p_j)\).
\(\mathrm{Cov}(X_j, X_\ell) = -n p_j p_\ell\) (negative because one count gain forces another’s loss).

Marginals: \(X_j \sim \mathrm{Binomial}(n, p_j)\). Pairwise joint for \((X_j, X_\ell)\): a trinomial with \((p_j, p_\ell, 1 - p_j - p_\ell)\).

MLE of \((p_1, \ldots, p_k)\) is the vector of observed proportions \(\hat{p}_j = x_j / n\).

Assumptions

Fixed total \(n\) trials.
Independent trials.
Constant category probabilities across trials.

R Implementation

set.seed(2026)

p <- c(0.40, 0.30, 0.20, 0.10)
n <- 200

# Single realisation
one_draw <- rmultinom(1, size = n, prob = p)
data.frame(category = 1:length(p), count = as.integer(one_draw),
           expected = n * p)

# Simulate many replicates, check moments
many <- rmultinom(1e4, n, p)
rowMeans(many); apply(many, 1, var)

# Expected variance
n * p * (1 - p)

# Chi-squared goodness-of-fit
obs <- c(82, 66, 34, 18)
chisq.test(obs, p = p)

# Multinomial likelihood of observed counts
dmultinom(obs, size = sum(obs), prob = p)

Output & Results

  category count expected
1        1    77       80
2        2    66       60
3        3    38       40
4        4    19       20

rowMeans:  80.07  60.04  39.95  19.94
apply var: 48.2   42.0   32.1   17.8

Expected var:
[1] 48.00 42.00 32.00 18.00

        Chi-squared test for given probabilities

data:  obs
X-squared = 3.2, df = 3, p-value = 0.363

The empirical means and variances match the theoretical values. The chi-squared test does not reject the specified category probabilities.

Interpretation

Reporting multinomial fits usually focuses on category proportions with uncertainty. For large samples, the chi-squared goodness-of-fit test is the standard tool for comparing observed multinomial counts to hypothesised probabilities.

Practical Tips

For small expected counts (<5), use Fisher’s exact test or Monte Carlo chi-squared (chisq.test(..., simulate.p.value = TRUE)).
Multinomial categories are mutually exclusive by definition; overlapping classifications require a different framework (multilabel models).
Conditional on the total, the multinomial is a sufficient statistic for the unordered category counts.
Bayesian inference for multinomial probabilities uses the Dirichlet prior (conjugate); posterior is Dirichlet with shape parameters updated by observed counts.
For regression with multinomial outcomes, use multinomial logistic regression (nnet::multinom).