Counting Reads with featureCounts

Bioinformatics

featurecounts

htseq

gene-counts

Assigning reads to genes or genomic features from a BAM file

Published

April 17, 2026

Introduction

After alignment, reads must be summarised into per-gene (or per-feature) counts for differential expression analysis. featureCounts (in Rsubread) and HTSeq are the standard tools.

Prerequisites

BAM files, genome annotation (GTF).

Theory

Each read is assigned to a feature (gene, exon) it overlaps. Key settings:

Strand-specificity: strandSpecific = 0 | 1 | 2 (unstranded / stranded / reverse-stranded).
Multi-mappers: usually counted once or distributed.
Primary-only: recommended.
Fractional vs. integer counting.

Assumptions

Matching annotation to genome reference.

R Implementation

library(Rsubread)

# Build counts from BAM + GTF
# counts <- featureCounts(files = c("s1.bam", "s2.bam"),
#                         annot.ext = "annotation.gtf",
#                         isGTFAnnotationFile = TRUE,
#                         GTF.featureType = "exon",
#                         GTF.attrType = "gene_id",
#                         strandSpecific = 2,
#                         isPairedEnd = TRUE)

# counts$counts: gene x sample matrix

Output & Results

Gene-by-sample counts matrix + feature lengths + summary of assigned vs ambiguous reads.

Interpretation

“featureCounts assigned 82 % of reads to genes; 8 % were ambiguous (multi-gene overlap) and 10 % were unmapped to features.”

Practical Tips

Report assignment statistics; low assignment rates flag alignment or annotation issues.
Strand-specificity must match library prep protocol; wrong setting halves counts.
Use primaryOnly = TRUE for ChIP / ATAC-seq.
For transcript-level counts, use pseudoalignment tools.
SummarizedExperiment objects bundle counts + metadata cleanly.