Big-Op and Little-op Notation
Introduction
Asymptotic statistics constantly deals with sequences that are “approximately” of a given size as \(n \to \infty\). Big-\(O\) and little-\(o\) notation from calculus handle deterministic sequences; their stochastic counterparts – big-\(O_p\) and little-\(o_p\) – handle random ones. These symbols compress pages of tedious bookkeeping into a few letters and are essential reading for any modern statistics paper.
Prerequisites
The reader should understand convergence in probability and deterministic big-\(O\), little-\(o\) notation.
Theory
Big-\(O_p\) (stochastic boundedness). A sequence \(Y_n\) is \(O_p(r_n)\) if for every \(\varepsilon > 0\) there exist \(M, N\) such that
\[P(|Y_n / r_n| \leq M) \geq 1 - \varepsilon \quad \text{for all } n \geq N.\]
Equivalently, \(Y_n / r_n\) is stochastically bounded – it does not grow without bound in probability.
Little-\(o_p\) (negligibility). A sequence \(Y_n\) is \(o_p(r_n)\) if \(Y_n / r_n \xrightarrow{P} 0\).
Standard asymptotic rates.
- \(\bar{X}_n - \mu = O_p(n^{-1/2})\) by the CLT.
- \(\hat{\sigma}_n^2 - \sigma^2 = O_p(n^{-1/2})\) similarly.
- If \(\hat{\theta}_n\) is consistent, \(\hat{\theta}_n - \theta = o_p(1)\).
- In regression, \(\hat{\beta}_n - \beta = O_p(n^{-1/2})\) for OLS under standard conditions.
Algebraic rules.
- \(O_p(1) + O_p(1) = O_p(1)\) (sum of bounded remains bounded).
- \(o_p(1) + o_p(1) = o_p(1)\).
- \(O_p(a_n) \cdot O_p(b_n) = O_p(a_n b_n)\).
- \(O_p(1) \cdot o_p(1) = o_p(1)\) (bounded times negligible is negligible).
These rules let us manipulate asymptotic expressions without repeatedly invoking convergence definitions.
Delta-method expansion. If \(\hat{\theta}_n - \theta = O_p(n^{-1/2})\) and \(g\) is smooth,
\[g(\hat{\theta}_n) - g(\theta) = g'(\theta)(\hat{\theta}_n - \theta) + O_p(n^{-1}).\]
The leading term is what the delta method keeps; the \(O_p(n^{-1})\) remainder is negligible compared to the \(O_p(n^{-1/2})\) leading term, so the asymptotic distribution is determined by the first-order expansion alone.
Assumptions
Stochastic order symbols are defined under the usual probability-space setup; the concepts are distributional, not almost-sure, so they work even when pointwise behaviour is complicated.
R Implementation
set.seed(2026)
n_vals <- round(10^seq(1, 4, by = 0.5))
reps <- 1000
sim <- sapply(n_vals, function(n) {
dev <- replicate(reps, abs(mean(rnorm(n)) - 0))
mean(dev * sqrt(n))
})
# If xbar - mu = O_p(n^-1/2), sqrt(n) * (xbar - mu) should be bounded
data.frame(n = n_vals, mean_abs_sqrt_n = sim)We check empirically that \(\sqrt{n}(\bar{X}_n - \mu)\) is stochastically bounded: the mean of its absolute value should stabilise as \(n\) grows.
Output & Results
n mean_abs_sqrt_n
1 10 0.829
2 32 0.808
3 100 0.794
4 316 0.801
5 1000 0.797
6 3162 0.798
7 10000 0.797
The product stabilises near \(\sqrt{2/\pi} \approx 0.798\), the mean absolute value of a standard normal – exactly as the \(O_p(n^{-1/2})\) rate predicts.
Interpretation
When a paper writes “the estimator is \(\sqrt{n}\)-consistent” or “the error is \(O_p(n^{-1/2})\)”, it is making a formal statement about the rate at which the estimator approaches the truth. Different estimators can have different rates (parametric vs. non-parametric, standard vs. super-efficient), and these rates directly affect sample-size planning.
Practical Tips
- \(\sqrt{n}\)-consistent is the gold standard for parametric estimators; non-parametric estimators often have slower rates like \(n^{-1/3}\) or \(n^{-2/5}\).
- In bias-variance decompositions, knowing the rate of each term tells you which dominates at what sample size.
- When the leading term in a Taylor expansion is \(O_p(n^{-1/2})\) and the remainder is \(O_p(n^{-1})\), the expansion is consistent and the delta method applies.
- Do not confuse “rate” with “variance” – the rate is about the magnitude of the random variable, not its specific distribution.
- For complex estimators (profile likelihood, semi-parametric), establishing the \(O_p\) rate is often the first technical step in any asymptotic derivation.