Kernel PCA

Multivariate Statistics
kernel-pca
non-linear
rbf
Non-linear dimensionality reduction via PCA in an implicit feature space
Published

April 17, 2026

Introduction

Kernel PCA performs PCA in a high-dimensional feature space implicitly defined by a kernel function. It captures non-linear structure that linear PCA misses – useful for data lying on curved manifolds.

Prerequisites

PCA, kernel methods.

Theory

A kernel \(K(x, y) = \langle \phi(x), \phi(y) \rangle\) implicitly maps inputs into a feature space. Eigendecompose the centred Gram matrix; project data onto the top eigenvectors.

Common kernels:

  • RBF (Gaussian): \(\exp(-\gamma \|x - y\|^2)\).
  • Polynomial: \((x^\top y + c)^d\).

Assumptions

Kernel encodes a meaningful similarity.

R Implementation

library(kernlab)

# Data on two concentric circles (linear PCA fails)
set.seed(2026)
theta <- runif(200, 0, 2 * pi)
r <- c(rep(1, 100), rep(3, 100))
X <- cbind(r * cos(theta), r * sin(theta)) + matrix(rnorm(400, 0, 0.1), 200)
class <- factor(r)

# Linear PCA
pca <- prcomp(X)
plot(pca$x[, 1:2], col = class, main = "Linear PCA")

# Kernel PCA with RBF
kpca <- kpca(X, kernel = "rbfdot", kpar = list(sigma = 0.1), features = 2)
plot(rotated(kpca), col = class, main = "Kernel PCA (RBF)")

Output & Results

Two-circle data: linear PCA fails to separate; kernel PCA with an RBF kernel does.

Interpretation

“Kernel PCA with RBF (\(\sigma\) = 0.1) separated the two concentric rings that linear PCA could not, by exploiting non-linear geometry.”

Practical Tips

  • Kernel choice matters; RBF is a reasonable default.
  • Hyperparameter \(\sigma\) (or \(\gamma\)) usually tuned by cross-validation on downstream task.
  • Kernel PCA can blow up for large \(n\) (Gram matrix is \(n \times n\)); Nystrom approximation helps.
  • Modern alternatives for non-linear embeddings: UMAP, t-SNE, autoencoders.
  • Visualisation-focused rather than feature-engineering; downstream supervised learning in kernel feature space benefits from KPCA when \(p\) is very small.