Outliers

foundations
outliers
iqr
grubbs
mahalanobis
Detecting univariate and multivariate outliers with boxplots, IQR fences, Grubbs’ test, and Mahalanobis distance
Published

April 17, 2026

Research question

Outliers distort means, inflate variances, and bias regression slopes. Two scenarios: (1) In a pharmacokinetics study, a plasma-concentration value of 120 ng/mL appears among values ranging 2-15 ng/mL – is this a transcription error or a genuine extreme responder? (2) In a cardiovascular risk model with 10 predictors, does any patient exert disproportionate leverage on the fitted coefficients?

Assumptions

Outlier detection is a diagnostic rather than a test; it identifies candidate observations for closer inspection.

Method Works for Assumption
IQR fence Univariate, any distribution Data roughly unimodal
Z-score / 3-sigma Univariate Approximately normal
Grubbs’ test Univariate, small n Normality (!)
Mahalanobis distance Multivariate Approximately multivariate normal
Cook’s distance Regression residuals Linear model assumptions

Hypotheses

For Grubbs’ test of a single outlier:

\[H_0: \text{no outlier} \qquad H_1: \text{the extreme observation is an outlier}\]

R code

library(tidyverse)
library(rstatix)
library(outliers)

set.seed(42)

# Scenario 1: plasma concentrations with one suspected extreme
pk <- tibble(
  subject = 1:18,
  conc_ng_ml = c(rnorm(17, mean = 8, sd = 3), 120)  # last value implausible
)

# Univariate fences
pk |> identify_outliers(conc_ng_ml)

# Visual check
pk |>
  ggplot(aes(y = conc_ng_ml)) +
  geom_boxplot(fill = "#2A9D8F", outlier.colour = "#F4A261", outlier.size = 3) +
  labs(y = "Concentration (ng/mL)") +
  theme_minimal()

# Grubbs' test for the single most extreme value
grubbs.test(pk$conc_ng_ml)

# Scenario 2: multivariate outliers via Mahalanobis distance
set.seed(99)
cv_risk <- tibble(
  age      = round(rnorm(80, 60, 10)),
  bmi      = round(rnorm(80, 27, 4), 1),
  sbp      = round(rnorm(80, 132, 16)),
  ldl      = round(rnorm(80, 3.3, 0.9), 2),
  crp      = round(rlnorm(80, log(3), 0.4), 2)
)

md2 <- mahalanobis(cv_risk,
                   center = colMeans(cv_risk),
                   cov    = cov(cv_risk))
cutoff <- qchisq(0.975, df = ncol(cv_risk))

cv_risk |>
  mutate(md2 = md2, outlier = md2 > cutoff) |>
  filter(outlier)

The rstatix::identify_outliers() function uses Tukey’s 1.5 * IQR and 3 * IQR fences to flag mild and extreme outliers. outliers::grubbs.test() returns a p-value for the most extreme value under normality. For multivariate outliers, the Mahalanobis distance squared is chi-squared-distributed with degrees of freedom equal to the number of variables.

Interpreting the output

Scenario 1: identify_outliers() flags the 120 ng/mL point as an extreme outlier; Grubbs’ test gives \(G = 4.02\), \(p < .001\). A sensible action is to inspect the source: a factor-10 transcription error (12.0 vs. 120) is by far the most common cause of such extremes.

Scenario 2: Mahalanobis distance flags rows with \(d^2 > \chi^2_{0.975, 5} = 12.83\). Such rows are candidates for exclusion or for a sensitivity analysis that reruns the regression without them.

Effect size

The effect of an outlier is quantified by its leverage (\(h_i\)) and Cook’s distance in regression. A Cook’s \(d_i > 1\) or \(> 4/n\) signals an influential point.

Reporting (APA 7)

One plasma-concentration value (120 ng/mL) exceeded three standard deviations and was flagged by Grubbs’ test (G = 4.02, p < .001). The record was reviewed: a decimal-point transcription error was confirmed and the value corrected to 12.0 ng/mL before analysis.

Common pitfalls

  • Automatic removal of outliers without inspection is data manipulation; always investigate.
  • The 1.5 * IQR fence will flag points even in perfectly normal data (about 0.7 % of observations).
  • Grubbs’ test assumes normality; it is not valid for heavily skewed data.
  • Mahalanobis distance is sensitive to its own outliers (the sample covariance is affected); use robust estimators (MCD via robustbase) for contaminated data.

Parametric vs. non-parametric alternative

When outliers cannot be verified or removed, switch to rank-based tests (Mann-Whitney, Kruskal-Wallis, Spearman) or to robust regression (MASS::rlm, robustbase::lmrob). These procedures down-weight extreme values automatically.

Further reading


Structure inspired by the University of Zurich Methodenberatung (methodenberatung.uzh.ch). All text, examples, R code, and reporting sentences are independently authored in English.