Outliers

foundations

outliers

iqr

grubbs

mahalanobis

Detecting univariate and multivariate outliers with boxplots, IQR fences, Grubbs’ test, and Mahalanobis distance

Published

April 17, 2026

Research question

Outliers distort means, inflate variances, and bias regression slopes. Two scenarios: (1) In a pharmacokinetics study, a plasma-concentration value of 120 ng/mL appears among values ranging 2-15 ng/mL – is this a transcription error or a genuine extreme responder? (2) In a cardiovascular risk model with 10 predictors, does any patient exert disproportionate leverage on the fitted coefficients?

Assumptions

Outlier detection is a diagnostic rather than a test; it identifies candidate observations for closer inspection.

Method	Works for	Assumption
IQR fence	Univariate, any distribution	Data roughly unimodal
Z-score / 3-sigma	Univariate	Approximately normal
Grubbs’ test	Univariate, small n	Normality (!)
Mahalanobis distance	Multivariate	Approximately multivariate normal
Cook’s distance	Regression residuals	Linear model assumptions

Hypotheses

For Grubbs’ test of a single outlier:

\[H_0: \text{no outlier} \qquad H_1: \text{the extreme observation is an outlier}\]

R code

library(tidyverse)
library(rstatix)
library(outliers)

set.seed(42)

# Scenario 1: plasma concentrations with one suspected extreme
pk <- tibble(
  subject = 1:18,
  conc_ng_ml = c(rnorm(17, mean = 8, sd = 3), 120)  # last value implausible
)

# Univariate fences
pk |> identify_outliers(conc_ng_ml)

# Visual check
pk |>
  ggplot(aes(y = conc_ng_ml)) +
  geom_boxplot(fill = "#2A9D8F", outlier.colour = "#F4A261", outlier.size = 3) +
  labs(y = "Concentration (ng/mL)") +
  theme_minimal()

# Grubbs' test for the single most extreme value
grubbs.test(pk$conc_ng_ml)

# Scenario 2: multivariate outliers via Mahalanobis distance
set.seed(99)
cv_risk <- tibble(
  age      = round(rnorm(80, 60, 10)),
  bmi      = round(rnorm(80, 27, 4), 1),
  sbp      = round(rnorm(80, 132, 16)),
  ldl      = round(rnorm(80, 3.3, 0.9), 2),
  crp      = round(rlnorm(80, log(3), 0.4), 2)
)

md2 <- mahalanobis(cv_risk,
                   center = colMeans(cv_risk),
                   cov    = cov(cv_risk))
cutoff <- qchisq(0.975, df = ncol(cv_risk))

cv_risk |>
  mutate(md2 = md2, outlier = md2 > cutoff) |>
  filter(outlier)

The rstatix::identify_outliers() function uses Tukey’s 1.5 * IQR and 3 * IQR fences to flag mild and extreme outliers. outliers::grubbs.test() returns a p-value for the most extreme value under normality. For multivariate outliers, the Mahalanobis distance squared is chi-squared-distributed with degrees of freedom equal to the number of variables.

Interpreting the output

Scenario 1: identify_outliers() flags the 120 ng/mL point as an extreme outlier; Grubbs’ test gives \(G = 4.02\), \(p < .001\). A sensible action is to inspect the source: a factor-10 transcription error (12.0 vs. 120) is by far the most common cause of such extremes.

Scenario 2: Mahalanobis distance flags rows with \(d^2 > \chi^2_{0.975, 5} = 12.83\). Such rows are candidates for exclusion or for a sensitivity analysis that reruns the regression without them.

Effect size

The effect of an outlier is quantified by its leverage (\(h_i\)) and Cook’s distance in regression. A Cook’s \(d_i > 1\) or \(> 4/n\) signals an influential point.

Reporting (APA 7)

One plasma-concentration value (120 ng/mL) exceeded three standard deviations and was flagged by Grubbs’ test (G = 4.02, p < .001). The record was reviewed: a decimal-point transcription error was confirmed and the value corrected to 12.0 ng/mL before analysis.

Common pitfalls

Automatic removal of outliers without inspection is data manipulation; always investigate.
The 1.5 * IQR fence will flag points even in perfectly normal data (about 0.7 % of observations).
Grubbs’ test assumes normality; it is not valid for heavily skewed data.
Mahalanobis distance is sensitive to its own outliers (the sample covariance is affected); use robust estimators (MCD via robustbase) for contaminated data.

Parametric vs. non-parametric alternative

When outliers cannot be verified or removed, switch to rank-based tests (Mann-Whitney, Kruskal-Wallis, Spearman) or to robust regression (MASS::rlm, robustbase::lmrob). These procedures down-weight extreme values automatically.