Bioinformatics

Computational analysis of biological data: sequence analysis, RNA-seq, variant calling, enrichment, phylogenetics, proteomics, and single-cell methods

Modern biology generates data at scales that demand computational methods. Bioinformatics is the discipline that turns raw sequencing reads, mass spectra, or single-cell measurements into the biological knowledge a manuscript can claim. This area walks through the canonical analyses that a biostatistician or bioinformatician is asked to perform.

Every tutorial assumes familiarity with R but not with the specific biology of each application; the underlying research question is explained at a level that makes the statistical choices transparent. Code examples use Bioconductor packages throughout, and where pipeline-level tools (e.g., STAR, salmon, GATK) are unavoidable, their place in the workflow is clearly indicated.

Topics covered

  • Sequence fundamentals: FASTA/FASTQ, quality scores, adapters, QC with ShortRead, Rfastp
  • Pairwise and multiple sequence alignment with Biostrings and msa
  • Read alignment strategies: spliced (STAR, HISAT2) vs. pseudo-alignment (salmon, kallisto)
  • Bulk RNA-seq differential expression with DESeq2, edgeR, and limma-voom
  • Multiple testing and shrinkage in high-dimensional contexts
  • Gene set enrichment analysis: over-representation, GSEA, GSVA
  • Variant calling, VCF manipulation, and variant annotation
  • Phylogenetic inference with ape, phangorn, and visualisation with ggtree
  • Proteomics quantification and differential abundance (MSstats, limma)
  • Single-cell RNA-seq with Seurat and Bioconductor (SingleCellExperiment, scran)
  • Cell-type annotation, trajectory inference, and differential composition
  • Microbiome analysis: amplicon sequencing (DADA2), metagenomics, diversity metrics

Workflows are presented end-to-end, from raw counts to the manuscript figure.

Tutorials

TUTORIAL

ATAC-Seq Analysis

Chromatin accessibility profiling via ATAC-seq peak calling and differential analysis

TUTORIAL

Alignment with BWA and Bowtie

Fast read alignment to a reference genome via Burrows-Wheeler Transform indexing

TUTORIAL

Batch Correction with ComBat

Removing known batch effects from expression data using empirical-Bayes methods

TUTORIAL

Bulk RNA-seq Differential Expression with DESeq2

A complete DESeq2 workflow: from count matrix through normalisation, dispersion estimation, Wald testing, and LFC shrinkage

TUTORIAL

Cell-Type Annotation

Assigning cell types to clusters using reference-based and manual methods

TUTORIAL

ChIP-Seq Analysis

TF binding and histone modification profiling from ChIP-seq data

TUTORIAL

Copy Number Variation Analysis

Inferring genomic copy-number changes from sequencing or array data

TUTORIAL

Counting Reads with featureCounts

Assigning reads to genes or genomic features from a BAM file

TUTORIAL

DNA Methylation Analysis

Differentially methylated positions and regions from bisulfite or array data

TUTORIAL

Differential Expression with edgeR

NB GLM-based differential expression with exactTest and quasi-likelihood F-test

TUTORIAL

Differential Expression with limma-voom

Transforming counts for linear modelling with precision weights

TUTORIAL

Drug-Target Interaction Mining

Integrating bioassay databases for drug-target identification

TUTORIAL

FASTQ Quality Control

Inspecting sequencing read quality via Phred scores and FastQC-style reports

TUTORIAL

Finding Marker Genes

Identifying cluster-specific genes for cell-type annotation

TUTORIAL

GSEA Preranked Analysis

Enrichment of gene sets in a ranked list without a significance cutoff

TUTORIAL

GSVA Single-Sample Enrichment

Per-sample pathway scores for downstream sample-level analysis

TUTORIAL

Gene Annotation with biomaRt

Programmatic queries against Ensembl for IDs, coordinates, and annotations

TUTORIAL

Gene Ontology Enrichment

Over-representation of GO terms in a differentially-expressed gene set

TUTORIAL

Heatmaps for RNA-seq

Visualising expression patterns across genes and samples with clustering

TUTORIAL

Integration with Harmony

Removing batch effects in scRNA-seq while preserving biological variation

TUTORIAL

KEGG Pathway Enrichment

Over-representation and visualisation against KEGG metabolic/signalling pathways

TUTORIAL

MA Plots

Mean-average plots for differential expression diagnostics

TUTORIAL

Metagenomic Profiling

Whole-metagenome taxonomic and functional profiling with MetaPhlAn and Kraken

TUTORIAL

Microbiome Analysis with DADA2

Amplicon sequence variant (ASV) inference from 16S/ITS amplicon data

TUTORIAL

Microbiome Diversity Metrics

Alpha and beta diversity measures for community comparison

TUTORIAL

Multi-Omics Integration

Joint analysis of transcriptomic, genomic, and epigenomic layers

TUTORIAL

Multiple Sequence Alignment

Aligning multiple sequences via ClustalW, Muscle, or T-Coffee from R

TUTORIAL

PCA of RNA-seq Samples

Using PCA on variance-stabilised counts to check sample structure and batches

TUTORIAL

Phylogenetic Trees with ape

Building and interpreting phylogenetic trees using distance-based and maximum-likelihood methods

TUTORIAL

Population Genetics Basics

Allele frequencies, Hardy-Weinberg equilibrium, and F-statistics

TUTORIAL

Protein Structure Prediction

From sequence to 3-D structure with AlphaFold and RoseTTAFold

TUTORIAL

Proteomics with MSstats

Differential protein abundance from label-free or labelled mass-spectrometry data

TUTORIAL

Pseudoalignment with Salmon and Kallisto

Fast transcript quantification without full alignment

TUTORIAL

RNA-seq Normalisation

TMM, RLE, upper-quartile, and CPM: making samples comparable

TUTORIAL

Read Trimming and Adapter Removal

Removing low-quality bases and sequencing adapters before alignment

TUTORIAL

STAR: Spliced Alignment

Fast splice-aware RNA-seq aligner capable of detecting novel junctions

TUTORIAL

Sequence Alignment: Overview

Global vs local alignment, scoring matrices, and the core algorithms

TUTORIAL

Single-Cell with Seurat

End-to-end workflow for single-cell RNA-seq analysis in Seurat

TUTORIAL

Spatial Transcriptomics

Spatially resolved expression analysis with Visium and related platforms

TUTORIAL

Structural Variant Detection

Detecting deletions, duplications, inversions, and translocations from WGS

TUTORIAL

Surrogate Variable Analysis (SVA)

Detecting and adjusting for unknown sources of heterogeneity in expression data

TUTORIAL

Trajectory Analysis

Inferring developmental and activation trajectories from scRNA-seq

TUTORIAL

Transcript-to-Gene Summarisation

Aggregating transcript-level estimates to gene level for standard differential expression

TUTORIAL

VCF Manipulation

Reading, filtering, and subsetting VCF files in R and on the command line

TUTORIAL

Variant Annotation with VEP

Predicting functional consequences of variants using Ensembl VEP

TUTORIAL

Variant Calling with GATK

Short-variant detection from aligned BAMs via the GATK Best Practices pipeline

TUTORIAL

Volcano Plots

Visualising significance versus effect size across genes

TUTORIAL

scRNA Clustering

Graph-based clustering of cells via Louvain or Leiden community detection

TUTORIAL

scRNA Normalisation

Library-size normalisation and variance stabilisation for scRNA-seq

TUTORIAL

scRNA QC and Filtering

Removing low-quality cells before downstream scRNA-seq analysis