Alignment with BWA and Bowtie
Bioinformatics
bwa
bowtie2
alignment
bwt
Fast read alignment to a reference genome via Burrows-Wheeler Transform indexing
Introduction
BWA-MEM and Bowtie2 are the standard short-read aligners for DNA. Both use a Burrows-Wheeler-transformed index of the reference for rapid exact-and-approximate matching. Production-scale alignment is done at the command line; R wrappers exist for convenience.
Prerequisites
FASTQ files, reference genome.
Theory
BWA-MEM: seeds local alignments with maximal exact matches, extends via Smith-Waterman. Default for whole-genome DNA sequencing.
Bowtie2: end-to-end or local alignment. Common choice for ChIP-seq, ATAC-seq.
Both output SAM / BAM.
Assumptions
Short reads (<~300 bp); appropriate reference.
R Implementation
library(Rbowtie2); library(Rsamtools)
# Build index (one-off)
# bowtie2_build(references = "reference.fa", bt2Index = "idx")
# Align
# bowtie2(bt2Index = "idx", samOutput = "aln.sam",
# seq1 = "reads.fastq", overwrite = TRUE)
# Convert SAM to sorted, indexed BAM
# asBam("aln.sam", destination = "aln_sorted")
# indexBam("aln_sorted.bam")
# Read a short segment
# scanBam("aln_sorted.bam", param = ScanBamParam(which = GRanges("chr1", IRanges(1, 1000))))Output & Results
BAM file with aligned reads; flagstat summary (total, mapped, duplicates).
Interpretation
“BWA-MEM achieved 97 % alignment rate on a 30x human WGS library, with <1 % ambiguously mapped reads.”
Practical Tips
- Use minimap2 for long reads (PacBio, Nanopore).
- Report alignment statistics (flagstat) after alignment.
- Index BAM files for downstream tools.
- Deduplicate after alignment (Picard MarkDuplicates,
samtools markdup). - For paired-end, specify both mate files.