Kernel PCA
Multivariate Statistics
kernel-pca
non-linear
rbf
Non-linear dimensionality reduction via PCA in an implicit feature space
Introduction
Kernel PCA performs PCA in a high-dimensional feature space implicitly defined by a kernel function. It captures non-linear structure that linear PCA misses – useful for data lying on curved manifolds.
Prerequisites
PCA, kernel methods.
Theory
A kernel \(K(x, y) = \langle \phi(x), \phi(y) \rangle\) implicitly maps inputs into a feature space. Eigendecompose the centred Gram matrix; project data onto the top eigenvectors.
Common kernels:
- RBF (Gaussian): \(\exp(-\gamma \|x - y\|^2)\).
- Polynomial: \((x^\top y + c)^d\).
Assumptions
Kernel encodes a meaningful similarity.
R Implementation
library(kernlab)
# Data on two concentric circles (linear PCA fails)
set.seed(2026)
theta <- runif(200, 0, 2 * pi)
r <- c(rep(1, 100), rep(3, 100))
X <- cbind(r * cos(theta), r * sin(theta)) + matrix(rnorm(400, 0, 0.1), 200)
class <- factor(r)
# Linear PCA
pca <- prcomp(X)
plot(pca$x[, 1:2], col = class, main = "Linear PCA")
# Kernel PCA with RBF
kpca <- kpca(X, kernel = "rbfdot", kpar = list(sigma = 0.1), features = 2)
plot(rotated(kpca), col = class, main = "Kernel PCA (RBF)")Output & Results
Two-circle data: linear PCA fails to separate; kernel PCA with an RBF kernel does.
Interpretation
“Kernel PCA with RBF (\(\sigma\) = 0.1) separated the two concentric rings that linear PCA could not, by exploiting non-linear geometry.”
Practical Tips
- Kernel choice matters; RBF is a reasonable default.
- Hyperparameter \(\sigma\) (or \(\gamma\)) usually tuned by cross-validation on downstream task.
- Kernel PCA can blow up for large \(n\) (Gram matrix is \(n \times n\)); Nystrom approximation helps.
- Modern alternatives for non-linear embeddings: UMAP, t-SNE, autoencoders.
- Visualisation-focused rather than feature-engineering; downstream supervised learning in kernel feature space benefits from KPCA when \(p\) is very small.