Transcript-to-Gene Summarisation
Introduction
Pseudoalignment tools (salmon, kallisto) produce transcript-level counts. For gene-level differential expression, transcripts are aggregated to genes using a transcript-to-gene map. tximport handles this and produces output ready for DESeq2 / edgeR / limma-voom.
Prerequisites
Transcript quantification, gene annotation.
Theory
Gene-level count \(= \sum_\text{transcripts in gene}\) transcript count. tximport can propagate uncertainty via effective lengths and offsets when available.
Direct transcript-level DE is also possible (swish, DRIMSeq).
Assumptions
Transcript-to-gene map consistent with the annotation used for quantification.
R Implementation
library(tximport); library(tximeta)
# Example: salmon outputs
# files <- list.files("salmon_out", pattern = "quant.sf", recursive = TRUE, full.names = TRUE)
# tx2gene <- read.table("tx2gene.tsv", header = TRUE)
#
# txi <- tximport(files, type = "salmon", tx2gene = tx2gene,
# countsFromAbundance = "lengthScaledTPM")
#
# txi$counts: gene x sample matrixOutput & Results
A list with counts, abundance (TPM), and length; the last is a matrix of effective lengths used downstream.
Interpretation
“tximport aggregated 190,000 transcripts to 22,000 genes; gene-level expression ready for DESeq2 analysis.”
Practical Tips
countsFromAbundance = "lengthScaledTPM"is standard for DE analysis.- Use
tximetato fetch the tx-to-gene map automatically from the salmon index. - For differential transcript usage (DTU), use DRIMSeq or DEXSeq.
- Always check that the tx-to-gene map matches the index used.
- Ensure sample order in
filesmatches metadata.