Bioinformatics
Modern biology generates data at scales that demand computational methods. Bioinformatics is the discipline that turns raw sequencing reads, mass spectra, or single-cell measurements into the biological knowledge a manuscript can claim. This area walks through the canonical analyses that a biostatistician or bioinformatician is asked to perform.
Every tutorial assumes familiarity with R but not with the specific biology of each application; the underlying research question is explained at a level that makes the statistical choices transparent. Code examples use Bioconductor packages throughout, and where pipeline-level tools (e.g., STAR, salmon, GATK) are unavoidable, their place in the workflow is clearly indicated.
Topics covered
- Sequence fundamentals: FASTA/FASTQ, quality scores, adapters, QC with
ShortRead,Rfastp - Pairwise and multiple sequence alignment with
Biostringsandmsa - Read alignment strategies: spliced (STAR, HISAT2) vs. pseudo-alignment (salmon, kallisto)
- Bulk RNA-seq differential expression with
DESeq2,edgeR, andlimma-voom - Multiple testing and shrinkage in high-dimensional contexts
- Gene set enrichment analysis: over-representation, GSEA, GSVA
- Variant calling, VCF manipulation, and variant annotation
- Phylogenetic inference with
ape,phangorn, and visualisation withggtree - Proteomics quantification and differential abundance (
MSstats,limma) - Single-cell RNA-seq with
SeuratandBioconductor(SingleCellExperiment,scran) - Cell-type annotation, trajectory inference, and differential composition
- Microbiome analysis: amplicon sequencing (DADA2), metagenomics, diversity metrics
Workflows are presented end-to-end, from raw counts to the manuscript figure.
Tutorials
TUTORIAL
ATAC-Seq Analysis
Chromatin accessibility profiling via ATAC-seq peak calling and differential analysis
TUTORIAL
Alignment with BWA and Bowtie
Fast read alignment to a reference genome via Burrows-Wheeler Transform indexing
TUTORIAL
Batch Correction with ComBat
Removing known batch effects from expression data using empirical-Bayes methods
TUTORIAL
Bulk RNA-seq Differential Expression with DESeq2
A complete DESeq2 workflow: from count matrix through normalisation, dispersion estimation, Wald testing, and LFC shrinkage
TUTORIAL
Cell-Type Annotation
Assigning cell types to clusters using reference-based and manual methods
TUTORIAL
ChIP-Seq Analysis
TF binding and histone modification profiling from ChIP-seq data
TUTORIAL
Copy Number Variation Analysis
Inferring genomic copy-number changes from sequencing or array data
TUTORIAL
Counting Reads with featureCounts
Assigning reads to genes or genomic features from a BAM file
TUTORIAL
DNA Methylation Analysis
Differentially methylated positions and regions from bisulfite or array data
TUTORIAL
Differential Expression with edgeR
NB GLM-based differential expression with exactTest and quasi-likelihood F-test
TUTORIAL
Differential Expression with limma-voom
Transforming counts for linear modelling with precision weights
TUTORIAL
Drug-Target Interaction Mining
Integrating bioassay databases for drug-target identification
TUTORIAL
FASTQ Quality Control
Inspecting sequencing read quality via Phred scores and FastQC-style reports
TUTORIAL
Finding Marker Genes
Identifying cluster-specific genes for cell-type annotation
TUTORIAL
GSEA Preranked Analysis
Enrichment of gene sets in a ranked list without a significance cutoff
TUTORIAL
GSVA Single-Sample Enrichment
Per-sample pathway scores for downstream sample-level analysis
TUTORIAL
Gene Annotation with biomaRt
Programmatic queries against Ensembl for IDs, coordinates, and annotations
TUTORIAL
Gene Ontology Enrichment
Over-representation of GO terms in a differentially-expressed gene set
TUTORIAL
Heatmaps for RNA-seq
Visualising expression patterns across genes and samples with clustering
TUTORIAL
Integration with Harmony
Removing batch effects in scRNA-seq while preserving biological variation
TUTORIAL
KEGG Pathway Enrichment
Over-representation and visualisation against KEGG metabolic/signalling pathways
TUTORIAL
MA Plots
Mean-average plots for differential expression diagnostics
TUTORIAL
Metagenomic Profiling
Whole-metagenome taxonomic and functional profiling with MetaPhlAn and Kraken
TUTORIAL
Microbiome Analysis with DADA2
Amplicon sequence variant (ASV) inference from 16S/ITS amplicon data
TUTORIAL
Microbiome Diversity Metrics
Alpha and beta diversity measures for community comparison
TUTORIAL
Multi-Omics Integration
Joint analysis of transcriptomic, genomic, and epigenomic layers
TUTORIAL
Multiple Sequence Alignment
Aligning multiple sequences via ClustalW, Muscle, or T-Coffee from R
TUTORIAL
PCA of RNA-seq Samples
Using PCA on variance-stabilised counts to check sample structure and batches
TUTORIAL
Phylogenetic Trees with ape
Building and interpreting phylogenetic trees using distance-based and maximum-likelihood methods
TUTORIAL
Population Genetics Basics
Allele frequencies, Hardy-Weinberg equilibrium, and F-statistics
TUTORIAL
Protein Structure Prediction
From sequence to 3-D structure with AlphaFold and RoseTTAFold
TUTORIAL
Proteomics with MSstats
Differential protein abundance from label-free or labelled mass-spectrometry data
TUTORIAL
Pseudoalignment with Salmon and Kallisto
Fast transcript quantification without full alignment
TUTORIAL
RNA-seq Normalisation
TMM, RLE, upper-quartile, and CPM: making samples comparable
TUTORIAL
Read Trimming and Adapter Removal
Removing low-quality bases and sequencing adapters before alignment
TUTORIAL
STAR: Spliced Alignment
Fast splice-aware RNA-seq aligner capable of detecting novel junctions
TUTORIAL
Sequence Alignment: Overview
Global vs local alignment, scoring matrices, and the core algorithms
TUTORIAL
Single-Cell with Seurat
End-to-end workflow for single-cell RNA-seq analysis in Seurat
TUTORIAL
Spatial Transcriptomics
Spatially resolved expression analysis with Visium and related platforms
TUTORIAL
Structural Variant Detection
Detecting deletions, duplications, inversions, and translocations from WGS
TUTORIAL
Surrogate Variable Analysis (SVA)
Detecting and adjusting for unknown sources of heterogeneity in expression data
TUTORIAL
Trajectory Analysis
Inferring developmental and activation trajectories from scRNA-seq
TUTORIAL
Transcript-to-Gene Summarisation
Aggregating transcript-level estimates to gene level for standard differential expression
TUTORIAL
VCF Manipulation
Reading, filtering, and subsetting VCF files in R and on the command line
TUTORIAL
Variant Annotation with VEP
Predicting functional consequences of variants using Ensembl VEP
TUTORIAL
Variant Calling with GATK
Short-variant detection from aligned BAMs via the GATK Best Practices pipeline
TUTORIAL
Volcano Plots
Visualising significance versus effect size across genes
TUTORIAL
scRNA Clustering
Graph-based clustering of cells via Louvain or Leiden community detection
TUTORIAL
scRNA Normalisation
Library-size normalisation and variance stabilisation for scRNA-seq
TUTORIAL
scRNA QC and Filtering
Removing low-quality cells before downstream scRNA-seq analysis