Robust Regression

Regression & Modelling
robust
m-estimator
huber
mm-estimator
M-estimators and MM-estimators for regression resistant to outliers and heavy-tailed errors
Published

April 17, 2026

Introduction

Robust regression estimators down-weight observations with large residuals, reducing the influence of outliers. Two main flavours: M-estimators (Huber, bisquare) and MM-estimators (high breakdown + high efficiency).

Prerequisites

OLS, leverage / influence.

Theory

M-estimators minimise \(\sum \rho(r_i / s)\) where \(\rho\) is a robust loss (quadratic for small residuals, bounded for large). Huber’s \(\rho\) transitions at 1.345 SDs; bisquare is zero beyond a cut.

MM-estimators combine a high-breakdown initial fit (S-estimator) with a high-efficiency M-estimator, giving breakdown 50 % and 95 % efficiency at normal errors.

Assumptions

Outliers in the outcome, not in predictors; otherwise use robust leverage-aware estimators.

R Implementation

library(MASS); library(robustbase)

# Simulated data with 10% outliers
set.seed(2026)
x <- rnorm(100)
y <- 2 + 1.5 * x + rnorm(100)
y[sample(100, 10)] <- y[sample(100, 10)] + rnorm(10, 0, 10)

fit_ols <- lm(y ~ x)
fit_rlm <- rlm(y ~ x)                      # Huber M-estimator
fit_mm  <- lmrob(y ~ x, method = "MM")     # MM-estimator

rbind(ols = coef(fit_ols), rlm = coef(fit_rlm), mm = coef(fit_mm))

Output & Results

OLS estimate pulled by outliers; RLM and MM estimates closer to the true slope.

Interpretation

“Robust MM-regression gave slope 1.48 (SE 0.11), closer to the true 1.5 than OLS’s estimate (1.35, SE 0.16) affected by 10 % contamination.”

Practical Tips

  • MM-estimators (lmrob) are the modern default for robust regression.
  • Report both OLS and robust fits; large differences indicate influential outliers.
  • Robust regression is not a substitute for understanding the source of outliers.
  • For outliers in predictors (high leverage), MM-estimators still resist but with less margin.
  • Robust standard errors (sandwich estimators) address heteroscedasticity but not outliers; two distinct fixes.