The Bernoulli Distribution
Introduction
The Bernoulli distribution models a single binary trial: success vs. failure, disease vs. healthy, click vs. no click. It is the simplest non-trivial distribution and the building block of the binomial, geometric, and negative binomial distributions.
Prerequisites
Random variables and basic probability.
Theory
A random variable \(X\) is Bernoulli(\(p\)) if it takes values in \(\{0, 1\}\) with
\[P(X = 1) = p, \qquad P(X = 0) = 1 - p.\]
PMF: \(p_X(x) = p^x (1 - p)^{1 - x}\) for \(x \in \{0, 1\}\).
Moments:
- \(E[X] = p\).
- \(\mathrm{Var}(X) = p(1 - p)\), maximised at \(p = 0.5\) (value 0.25).
- \(E[X^k] = p\) for every \(k \geq 1\) (since \(X^k = X\)).
The sum of \(n\) iid Bernoulli\((p)\) variables is Binomial\((n, p)\).
MLE of \(p\) from an iid sample \(X_1, \ldots, X_n\): \(\hat{p} = \bar{X}\), the sample proportion.
Assumptions
Independence across trials and a common probability \(p\). Violations of either produce overdispersion (if \(p\) varies) or correlation (if trials are dependent).
R Implementation
set.seed(2026)
p <- 0.3
n <- 10000
# Sample from Bernoulli(p) via rbinom with size = 1
X <- rbinom(n, size = 1, prob = p)
# Sample mean is the MLE of p
c(p_hat = mean(X), p_true = p)
# Sample variance approximates p(1 - p)
c(sample_var = var(X), theoretical = p * (1 - p))
# 95% Wald confidence interval for p
se_p <- sqrt(mean(X) * (1 - mean(X)) / n)
c(lower = mean(X) - 1.96 * se_p, upper = mean(X) + 1.96 * se_p)
# Wilson (better for small p or small n)
binom::binom.confint(sum(X), n, conf.level = 0.95, methods = "wilson")Output & Results
p_hat p_true
0.3002 0.3000
sample_var theoretical
0.2101 0.2100
lower upper
0.2912 0.3092
method x n mean lower upper
Wilson 3002 10000 0.3002 0.2912 0.3094
Empirical mean is \(0.3002 \approx p\); empirical variance is \(0.2101 \approx p(1-p) = 0.21\); Wald and Wilson CIs agree in this range.
Interpretation
In applied work, the Bernoulli is the underlying distribution of every binary outcome: adverse event, responder status, screen-positive. Reporting usually aggregates to proportions across \(n\) independent Bernoullis – i.e., to the binomial framework.
Practical Tips
- For small samples or extreme \(p\) (near 0 or 1), use Wilson or Clopper-Pearson CIs instead of Wald.
- Overdispersion (variance exceeding \(p(1-p)\)) in grouped data signals unobserved heterogeneity; consider beta-binomial or mixed-effects logistic regression.
rbinom(n, size = 1, prob = p)is the direct way to simulate Bernoullis;sample(0:1, n, replace = TRUE, prob = c(1 - p, p))is equivalent.- The Bernoulli likelihood is the foundation of logistic regression; MLE of \(p\) from regression coefficients requires the logit link.
- Two dependent Bernoullis require a specification of their joint distribution – usually via a copula or a 2x2 joint PMF.