The Bernoulli Distribution

Probability Theory
bernoulli
binary
trial
The distribution of a single binary trial with success probability p
Published

April 17, 2026

Introduction

The Bernoulli distribution models a single binary trial: success vs. failure, disease vs. healthy, click vs. no click. It is the simplest non-trivial distribution and the building block of the binomial, geometric, and negative binomial distributions.

Prerequisites

Random variables and basic probability.

Theory

A random variable \(X\) is Bernoulli(\(p\)) if it takes values in \(\{0, 1\}\) with

\[P(X = 1) = p, \qquad P(X = 0) = 1 - p.\]

PMF: \(p_X(x) = p^x (1 - p)^{1 - x}\) for \(x \in \{0, 1\}\).

Moments:

  • \(E[X] = p\).
  • \(\mathrm{Var}(X) = p(1 - p)\), maximised at \(p = 0.5\) (value 0.25).
  • \(E[X^k] = p\) for every \(k \geq 1\) (since \(X^k = X\)).

The sum of \(n\) iid Bernoulli\((p)\) variables is Binomial\((n, p)\).

MLE of \(p\) from an iid sample \(X_1, \ldots, X_n\): \(\hat{p} = \bar{X}\), the sample proportion.

Assumptions

Independence across trials and a common probability \(p\). Violations of either produce overdispersion (if \(p\) varies) or correlation (if trials are dependent).

R Implementation

set.seed(2026)

p <- 0.3
n <- 10000

# Sample from Bernoulli(p) via rbinom with size = 1
X <- rbinom(n, size = 1, prob = p)

# Sample mean is the MLE of p
c(p_hat = mean(X), p_true = p)

# Sample variance approximates p(1 - p)
c(sample_var = var(X), theoretical = p * (1 - p))

# 95% Wald confidence interval for p
se_p <- sqrt(mean(X) * (1 - mean(X)) / n)
c(lower = mean(X) - 1.96 * se_p, upper = mean(X) + 1.96 * se_p)

# Wilson (better for small p or small n)
binom::binom.confint(sum(X), n, conf.level = 0.95, methods = "wilson")

Output & Results

    p_hat    p_true
   0.3002     0.3000

sample_var theoretical
   0.2101      0.2100

   lower    upper
  0.2912   0.3092

method    x    n    mean    lower    upper
Wilson 3002 10000  0.3002   0.2912   0.3094

Empirical mean is \(0.3002 \approx p\); empirical variance is \(0.2101 \approx p(1-p) = 0.21\); Wald and Wilson CIs agree in this range.

Interpretation

In applied work, the Bernoulli is the underlying distribution of every binary outcome: adverse event, responder status, screen-positive. Reporting usually aggregates to proportions across \(n\) independent Bernoullis – i.e., to the binomial framework.

Practical Tips

  • For small samples or extreme \(p\) (near 0 or 1), use Wilson or Clopper-Pearson CIs instead of Wald.
  • Overdispersion (variance exceeding \(p(1-p)\)) in grouped data signals unobserved heterogeneity; consider beta-binomial or mixed-effects logistic regression.
  • rbinom(n, size = 1, prob = p) is the direct way to simulate Bernoullis; sample(0:1, n, replace = TRUE, prob = c(1 - p, p)) is equivalent.
  • The Bernoulli likelihood is the foundation of logistic regression; MLE of \(p\) from regression coefficients requires the logit link.
  • Two dependent Bernoullis require a specification of their joint distribution – usually via a copula or a 2x2 joint PMF.