Big-Op and Little-op Notation

Statistical Foundations
big-op
little-op
asymptotic
order
Stochastic order symbols for describing rates of convergence and stochastic magnitudes
Published

April 17, 2026

Introduction

Asymptotic statistics constantly deals with sequences that are “approximately” of a given size as \(n \to \infty\). Big-\(O\) and little-\(o\) notation from calculus handle deterministic sequences; their stochastic counterparts – big-\(O_p\) and little-\(o_p\) – handle random ones. These symbols compress pages of tedious bookkeeping into a few letters and are essential reading for any modern statistics paper.

Prerequisites

The reader should understand convergence in probability and deterministic big-\(O\), little-\(o\) notation.

Theory

Big-\(O_p\) (stochastic boundedness). A sequence \(Y_n\) is \(O_p(r_n)\) if for every \(\varepsilon > 0\) there exist \(M, N\) such that

\[P(|Y_n / r_n| \leq M) \geq 1 - \varepsilon \quad \text{for all } n \geq N.\]

Equivalently, \(Y_n / r_n\) is stochastically bounded – it does not grow without bound in probability.

Little-\(o_p\) (negligibility). A sequence \(Y_n\) is \(o_p(r_n)\) if \(Y_n / r_n \xrightarrow{P} 0\).

Standard asymptotic rates.

  • \(\bar{X}_n - \mu = O_p(n^{-1/2})\) by the CLT.
  • \(\hat{\sigma}_n^2 - \sigma^2 = O_p(n^{-1/2})\) similarly.
  • If \(\hat{\theta}_n\) is consistent, \(\hat{\theta}_n - \theta = o_p(1)\).
  • In regression, \(\hat{\beta}_n - \beta = O_p(n^{-1/2})\) for OLS under standard conditions.

Algebraic rules.

  • \(O_p(1) + O_p(1) = O_p(1)\) (sum of bounded remains bounded).
  • \(o_p(1) + o_p(1) = o_p(1)\).
  • \(O_p(a_n) \cdot O_p(b_n) = O_p(a_n b_n)\).
  • \(O_p(1) \cdot o_p(1) = o_p(1)\) (bounded times negligible is negligible).

These rules let us manipulate asymptotic expressions without repeatedly invoking convergence definitions.

Delta-method expansion. If \(\hat{\theta}_n - \theta = O_p(n^{-1/2})\) and \(g\) is smooth,

\[g(\hat{\theta}_n) - g(\theta) = g'(\theta)(\hat{\theta}_n - \theta) + O_p(n^{-1}).\]

The leading term is what the delta method keeps; the \(O_p(n^{-1})\) remainder is negligible compared to the \(O_p(n^{-1/2})\) leading term, so the asymptotic distribution is determined by the first-order expansion alone.

Assumptions

Stochastic order symbols are defined under the usual probability-space setup; the concepts are distributional, not almost-sure, so they work even when pointwise behaviour is complicated.

R Implementation

set.seed(2026)

n_vals <- round(10^seq(1, 4, by = 0.5))
reps <- 1000

sim <- sapply(n_vals, function(n) {
  dev <- replicate(reps, abs(mean(rnorm(n)) - 0))
  mean(dev * sqrt(n))
})

# If xbar - mu = O_p(n^-1/2), sqrt(n) * (xbar - mu) should be bounded
data.frame(n = n_vals, mean_abs_sqrt_n = sim)

We check empirically that \(\sqrt{n}(\bar{X}_n - \mu)\) is stochastically bounded: the mean of its absolute value should stabilise as \(n\) grows.

Output & Results

     n  mean_abs_sqrt_n
1   10       0.829
2   32       0.808
3  100       0.794
4  316       0.801
5 1000       0.797
6 3162       0.798
7 10000      0.797

The product stabilises near \(\sqrt{2/\pi} \approx 0.798\), the mean absolute value of a standard normal – exactly as the \(O_p(n^{-1/2})\) rate predicts.

Interpretation

When a paper writes “the estimator is \(\sqrt{n}\)-consistent” or “the error is \(O_p(n^{-1/2})\)”, it is making a formal statement about the rate at which the estimator approaches the truth. Different estimators can have different rates (parametric vs. non-parametric, standard vs. super-efficient), and these rates directly affect sample-size planning.

Practical Tips

  • \(\sqrt{n}\)-consistent is the gold standard for parametric estimators; non-parametric estimators often have slower rates like \(n^{-1/3}\) or \(n^{-2/5}\).
  • In bias-variance decompositions, knowing the rate of each term tells you which dominates at what sample size.
  • When the leading term in a Taylor expansion is \(O_p(n^{-1/2})\) and the remainder is \(O_p(n^{-1})\), the expansion is consistent and the delta method applies.
  • Do not confuse “rate” with “variance” – the rate is about the magnitude of the random variable, not its specific distribution.
  • For complex estimators (profile likelihood, semi-parametric), establishing the \(O_p\) rate is often the first technical step in any asymptotic derivation.