Counting Reads with featureCounts

Bioinformatics
featurecounts
htseq
gene-counts
Assigning reads to genes or genomic features from a BAM file
Published

April 17, 2026

Introduction

After alignment, reads must be summarised into per-gene (or per-feature) counts for differential expression analysis. featureCounts (in Rsubread) and HTSeq are the standard tools.

Prerequisites

BAM files, genome annotation (GTF).

Theory

Each read is assigned to a feature (gene, exon) it overlaps. Key settings:

  • Strand-specificity: strandSpecific = 0 | 1 | 2 (unstranded / stranded / reverse-stranded).
  • Multi-mappers: usually counted once or distributed.
  • Primary-only: recommended.
  • Fractional vs. integer counting.

Assumptions

Matching annotation to genome reference.

R Implementation

library(Rsubread)

# Build counts from BAM + GTF
# counts <- featureCounts(files = c("s1.bam", "s2.bam"),
#                         annot.ext = "annotation.gtf",
#                         isGTFAnnotationFile = TRUE,
#                         GTF.featureType = "exon",
#                         GTF.attrType = "gene_id",
#                         strandSpecific = 2,
#                         isPairedEnd = TRUE)

# counts$counts: gene x sample matrix

Output & Results

Gene-by-sample counts matrix + feature lengths + summary of assigned vs ambiguous reads.

Interpretation

“featureCounts assigned 82 % of reads to genes; 8 % were ambiguous (multi-gene overlap) and 10 % were unmapped to features.”

Practical Tips

  • Report assignment statistics; low assignment rates flag alignment or annotation issues.
  • Strand-specificity must match library prep protocol; wrong setting halves counts.
  • Use primaryOnly = TRUE for ChIP / ATAC-seq.
  • For transcript-level counts, use pseudoalignment tools.
  • SummarizedExperiment objects bundle counts + metadata cleanly.