Robust Regression
Introduction
Robust regression estimators down-weight observations with large residuals, reducing the influence of outliers. Two main flavours: M-estimators (Huber, bisquare) and MM-estimators (high breakdown + high efficiency).
Prerequisites
OLS, leverage / influence.
Theory
M-estimators minimise \(\sum \rho(r_i / s)\) where \(\rho\) is a robust loss (quadratic for small residuals, bounded for large). Huber’s \(\rho\) transitions at 1.345 SDs; bisquare is zero beyond a cut.
MM-estimators combine a high-breakdown initial fit (S-estimator) with a high-efficiency M-estimator, giving breakdown 50 % and 95 % efficiency at normal errors.
Assumptions
Outliers in the outcome, not in predictors; otherwise use robust leverage-aware estimators.
R Implementation
library(MASS); library(robustbase)
# Simulated data with 10% outliers
set.seed(2026)
x <- rnorm(100)
y <- 2 + 1.5 * x + rnorm(100)
y[sample(100, 10)] <- y[sample(100, 10)] + rnorm(10, 0, 10)
fit_ols <- lm(y ~ x)
fit_rlm <- rlm(y ~ x) # Huber M-estimator
fit_mm <- lmrob(y ~ x, method = "MM") # MM-estimator
rbind(ols = coef(fit_ols), rlm = coef(fit_rlm), mm = coef(fit_mm))Output & Results
OLS estimate pulled by outliers; RLM and MM estimates closer to the true slope.
Interpretation
“Robust MM-regression gave slope 1.48 (SE 0.11), closer to the true 1.5 than OLS’s estimate (1.35, SE 0.16) affected by 10 % contamination.”
Practical Tips
- MM-estimators (
lmrob) are the modern default for robust regression. - Report both OLS and robust fits; large differences indicate influential outliers.
- Robust regression is not a substitute for understanding the source of outliers.
- For outliers in predictors (high leverage), MM-estimators still resist but with less margin.
- Robust standard errors (sandwich estimators) address heteroscedasticity but not outliers; two distinct fixes.