Transcript-to-Gene Summarisation

Bioinformatics
tximport
transcript
gene-level
Aggregating transcript-level estimates to gene level for standard differential expression
Published

April 17, 2026

Introduction

Pseudoalignment tools (salmon, kallisto) produce transcript-level counts. For gene-level differential expression, transcripts are aggregated to genes using a transcript-to-gene map. tximport handles this and produces output ready for DESeq2 / edgeR / limma-voom.

Prerequisites

Transcript quantification, gene annotation.

Theory

Gene-level count \(= \sum_\text{transcripts in gene}\) transcript count. tximport can propagate uncertainty via effective lengths and offsets when available.

Direct transcript-level DE is also possible (swish, DRIMSeq).

Assumptions

Transcript-to-gene map consistent with the annotation used for quantification.

R Implementation

library(tximport); library(tximeta)

# Example: salmon outputs
# files <- list.files("salmon_out", pattern = "quant.sf", recursive = TRUE, full.names = TRUE)
# tx2gene <- read.table("tx2gene.tsv", header = TRUE)
#
# txi <- tximport(files, type = "salmon", tx2gene = tx2gene,
#                 countsFromAbundance = "lengthScaledTPM")
#
# txi$counts: gene x sample matrix

Output & Results

A list with counts, abundance (TPM), and length; the last is a matrix of effective lengths used downstream.

Interpretation

“tximport aggregated 190,000 transcripts to 22,000 genes; gene-level expression ready for DESeq2 analysis.”

Practical Tips

  • countsFromAbundance = "lengthScaledTPM" is standard for DE analysis.
  • Use tximeta to fetch the tx-to-gene map automatically from the salmon index.
  • For differential transcript usage (DTU), use DRIMSeq or DEXSeq.
  • Always check that the tx-to-gene map matches the index used.
  • Ensure sample order in files matches metadata.