Finding Marker Genes
Introduction
Marker genes are genes differentially expressed in one cluster relative to others; they drive cell-type annotation and functional interpretation. Seurat’s FindMarkers and FindAllMarkers implement standard DE tests (Wilcoxon, MAST, DESeq2) on log-normalised or SCT data per cluster.
Prerequisites
Clustered Seurat object; normalised expression.
Theory
For each cluster vs all others (or vs a specified comparison cluster), a per-gene test produces log-fold change, p-value, and percent-expressing in each group. Wilcoxon is the default; MAST (hurdle model) and DESeq2 handle zero-inflation more rigorously but are slower.
Marker criteria: \(\log_2 \text{FC} > 0.5\), adjusted p < 0.05, expressed in > 25 % of cells in the target cluster.
Assumptions
Cells within a cluster are reasonably homogeneous; cells in the “other” group are a valid reference. Multiple testing is across genes (not also clusters).
R Implementation
library(Seurat)
set.seed(2026)
counts <- matrix(rpois(300 * 200, lambda = 2), 200, 300)
rownames(counts) <- paste0("g", 1:200)
colnames(counts) <- paste0("c", 1:300)
# Inject a marker pattern in cells 1-50
counts[1:10, 1:50] <- counts[1:10, 1:50] + 10
so <- CreateSeuratObject(counts)
so <- NormalizeData(so, verbose = FALSE)
so <- FindVariableFeatures(so, nfeatures = 100, verbose = FALSE)
so <- ScaleData(so, verbose = FALSE)
so <- RunPCA(so, npcs = 10, verbose = FALSE)
so <- FindNeighbors(so, dims = 1:10, verbose = FALSE)
so <- FindClusters(so, resolution = 0.5, verbose = FALSE)
markers_all <- FindAllMarkers(so, only.pos = TRUE, min.pct = 0.25,
logfc.threshold = 0.25, verbose = FALSE)
head(markers_all[, c("cluster", "gene", "avg_log2FC", "p_val_adj", "pct.1")])Output & Results
A data frame of markers per cluster: gene, average log2FC, adjusted p-value, fraction expressing in target vs reference.
Interpretation
“Cluster 2’s top markers (CD8A, GZMB, PRF1) identify cytotoxic T cells; cluster 5’s markers (MS4A1, CD79A) identify B cells.”
Practical Tips
- Use
only.pos = TRUEto retain up-regulated markers; down-regulated markers are less interpretable in scRNA-seq. min.pct = 0.25filters genes with limited expression in the target cluster – a quality gate.- MAST (
test.use = "MAST") gives more conservative p-values and handles zero-inflation; prefer for publication. - Always annotate clusters after marker identification, not before; known-marker lookup avoids circular reasoning.
- Do not DE-test between clusters produced by the same clustering – this is a form of data-splitting double-dipping and inflates significance.