All Tutorials

Complete index of every tutorial, grouped by topic area.

INDEX

All tutorials

Every tutorial on the site, grouped by topic. Use the links below to jump to a topic landing page or directly to an individual tutorial.

Statistical Foundations

Bias and Variance of Estimators – Decomposing an estimator’s mean squared error into systematic bias and sampling variance
Big-Op and Little-op Notation – Stochastic order symbols for describing rates of convergence and stochastic magnitudes
Characteristic Functions – Fourier transforms of distributions: always exist, uniquely identify the distribution, and drive convergence theorems
Confidence Intervals: Introduction – Interval estimates with a coverage guarantee, and how they differ from credible intervals
Convergence in Distribution – The weakest standard mode of convergence, underlying the central limit theorem
Convergence in Probability – The mode of convergence that underpins consistency of estimators
Estimators and Estimation – Point estimation and the plug-in principle: turning a sample into a guess about a population parameter
Fisher Information – How much the data tell you about a parameter, measured by the curvature of the log-likelihood
Jensen’s Inequality – The expectation of a convex function is at least the convex function of the expectation
Likelihood Ratio Tests – Comparing nested models by the ratio of maximised likelihoods, with Wilks’ asymptotic chi-squared
Markov and Chebyshev Inequalities – Tail bounds derived from moments, with the LLN as a simple consequence
Maximum Likelihood Estimation – The likelihood function, score equations, and asymptotic properties of the MLE
Moments and Moment Generating Functions – Moments summarise distributions; the MGF is a generating function that often characterises them uniquely
Order Statistics – Distributions of the sorted sample and their role in quantile theory and robust estimation
Pivotal Quantities – Functions of data and parameter whose distribution does not depend on the parameter, and the CIs they produce
Population vs. Sample – Target population, sampling frame, and the distinction between parameters and statistics
Sampling Distributions – The distribution of a statistic across repeated samples: the key object that makes inference possible
Sampling Methods – Simple random, stratified, cluster, systematic, and convenience sampling – and when each is appropriate
Scales of Measurement – Nominal, ordinal, interval, and ratio scales, and which statistics each legitimately supports
Slutsky’s Theorem – Combining convergence in distribution and in probability for asymptotic arguments
Sufficient Statistics – Summaries that capture every bit of information about a parameter, with the factorisation theorem and exponential family
The Cauchy-Schwarz Inequality – The inner-product inequality that bounds correlations and drives many variance inequalities
The Central Limit Theorem – Why the sampling distribution of the mean is approximately normal, and what that means for everyday inference
The Cramer-Rao Lower Bound – The smallest variance an unbiased estimator can achieve, derived from Fisher information
The Delta Method – Asymptotic distributions of smooth functions of estimators, via a first-order Taylor expansion
The Empirical Distribution Function – The sample-based step-function estimator of the CDF, with the DKW inequality for its uniform error
The Glivenko-Cantelli Theorem – The uniform LLN for CDFs: the ECDF converges almost surely to the true CDF at every point simultaneously
The Law of Large Numbers – Why the sample mean converges to the population mean as the sample grows, in weak and strong forms
The Method of Moments – Estimate parameters by equating theoretical moments to their sample counterparts
The Standard Error – The standard deviation of a statistic’s sampling distribution, and how to compute it in R
Unbiasedness, Consistency, Efficiency – Three core finite- and large-sample properties that let us compare estimators

Descriptive Statistics

Coefficient of Variation – Scale-free dispersion: the ratio of the standard deviation to the mean
Contingency Tables – r × c tables of counts, their expected values under independence, and standardised residuals
Cross-Tabulations – Two-way and higher-dimensional tables for joint distributions of categorical variables
Descriptive vs Inferential Statistics – The distinction between summarising the sample and making claims about the population
Five-Number Summary – Min, Q1, median, Q3, max: the compact distribution summary that powers the boxplot
Frequency Tables – Counting categorical and discrete-numeric variables: absolute, relative, and cumulative frequencies
Kurtosis – Quantifying tail heaviness of a distribution, and interpreting excess kurtosis
Mean Absolute Deviation – Average absolute deviation from the mean or median: a robust alternative to the standard deviation
Measures of Central Tendency – Mean, median, mode, and their robust cousins – when each is appropriate and how to compute them in R
Measures of Dispersion – Quantifying how spread out the data are: variance, SD, MAD, IQR, and range
Measures of Shape – Skewness and kurtosis: summarising the asymmetry and tail heaviness of a distribution beyond location and spread
Outlier Detection Rules – Univariate rules: 1.5·IQR fences, z-score thresholds, Hampel identifier, and Grubbs’ test
Percentile Ranks – Converting a raw value to its position in the sample distribution, expressed as a percentage
Quantiles and Percentiles – Dividing a distribution by cumulative probability: the median, quartiles, percentiles, and R’s nine quantile definitions
Robust Statistics: An Overview – Estimators designed to remain close to the truth under contamination, with key concepts: breakdown point, influence function, efficiency
Skewness – Measuring the asymmetry of a distribution: sample estimators, interpretation, and typical values in applied data
Standardisation and z-Scores – Centring and scaling a variable, and the robust-z alternative
Stem-and-Leaf Plots – A text-based compact display of small samples that preserves every data value
Summary by Group – Split-apply-combine pattern for computing summaries within subgroups of the data
The Geometric Mean – Log-scale averaging for multiplicative data: ratios, growth rates, titres
The Harmonic Mean – Reciprocal-averaged summary for rates, speeds, and F1-style composite scores
The Interquartile Range – Width of the central 50% of the distribution: robust, interpretable, and the basis of Tukey fences
The Weighted Mean – Averaging with unequal weights: survey inclusion probabilities, meta-analysis pooling, and more
Trimmed and Winsorised Means – Robust location estimators that compromise between the mean and the median
Variance and Standard Deviation – Squared and unsquared dispersion, the n vs n-1 divisor, and why the sample SD is biased

Probability Theory

Bayes’ Theorem – Inverting conditional probabilities: from likelihood to posterior, with applications to diagnostic testing and Bayesian inference
Conditional Distributions – The distribution of one random variable given the value of another, and conditional expectations
Conditional Probability – Probability of one event given another, the product rule, and how conditioning reshapes the sample space
Convolutions of Distributions – Distribution of a sum of independent random variables, computed via convolution integrals
Copulas: An Introduction – Separating marginal distributions from the dependence structure via Sklar’s theorem
Discrete vs Continuous Variables – Two fundamental kinds of random variable and the mixed case in between
Expectation of a Random Variable – The probability-weighted average of a random variable’s values, with its algebraic properties
Independence – Events and variables that do not inform each other, and the distinction between pairwise and mutual independence
Joint Distributions – The distribution of a pair (or vector) of random variables, encoding their dependence structure
Kolmogorov’s Axioms of Probability – The three axioms that define a probability measure on a sigma-algebra of events
Law of Total Probability – Decomposing a marginal probability over a partition of the sample space
Marginal Distributions – Recovering the distribution of a single variable from a joint distribution by integrating out the others
Random Variables – Measurable functions from sample space to real numbers: the mathematical object behind every statistic
Sample Space and Events – The set of possible outcomes, events as subsets, and the algebra of set operations
Student’s t-Distribution – Heavier-tailed relative of the normal, used for small-sample inference on means
The Bernoulli Distribution – The distribution of a single binary trial with success probability p
The Beta Distribution – Distribution over [0, 1] for proportions and probabilities; conjugate prior for the binomial
The Binomial Distribution – The count of successes in n independent Bernoulli trials with common probability p
The Chi-Squared Distribution – Sum of squared independent standard normals; foundation of chi-squared tests and ANOVA
The Correlation Coefficient – Standardised covariance bounded in [-1, 1], measuring the strength of linear association
The Cumulative Distribution Function – F(x) = P(X ≤ x): the universal descriptor of a random variable’s distribution
The Exponential Distribution – Waiting time between events in a Poisson process, with the memoryless property
The F Distribution – Ratio of two independent scaled chi-squared random variables; foundation of ANOVA and variance tests
The Gamma Distribution – Flexible right-skewed distribution for positive continuous data; sum of independent exponentials
The Geometric Distribution – Number of Bernoulli trials until the first success, with the memoryless property
The Hazard Function – Instantaneous event rate given survival so far; central concept of survival analysis
The Hypergeometric Distribution – Sampling without replacement from a finite population
The Log-Normal Distribution – Positive-valued distribution whose logarithm is normal; multiplicative processes and biomedical concentrations
The Multinomial Distribution – Multi-category generalisation of the binomial with category probabilities summing to one
The Multivariate Normal Distribution – Generalisation of the normal to random vectors; the keystone of multivariate analysis
The Negative Binomial Distribution – Overdispersed count distribution, derivable as a gamma mixture of Poissons
The Normal Distribution – Definition, properties, and practical use of the normal (Gaussian) distribution, with R examples for every common task
The Poisson Distribution – Counts of rare events per unit time or space with rate λ
The Probability Density Function – The function whose integral gives probability for a continuous random variable
The Probability Mass Function – The function that assigns probability to each value of a discrete random variable
The Survival Function – Probability of surviving past time t; the complement of the CDF for positive random variables
The Uniform Distribution – Equal probability over an interval (continuous) or a set (discrete)
The Weibull Distribution – Flexible positive-valued distribution with monotone hazard; reliability and survival analysis
Transformations of Random Variables – Distribution of g(X) given the distribution of X, via the change-of-variables formula
Variance and Covariance – Second moments of one and two random variables: dispersion and joint variation

Inferential Statistics

Anderson-Darling Test – A distribution goodness-of-fit test with emphasis on the tails
Bartlett’s Test – Test of equal variances across groups, most powerful under normality but very sensitive to its violation
Benjamini-Hochberg FDR – Controlling the expected proportion of false discoveries among rejected hypotheses
Benjamini-Yekutieli FDR – Dependence-robust false discovery rate control at the cost of conservatism
Bonferroni Correction – Controlling family-wise error rate by dividing alpha across m tests
Bootstrap Confidence Intervals – Percentile, basic, BCa, and studentised bootstrap CIs
Chi-Squared Goodness-of-Fit Test – Testing whether observed category frequencies match an expected distribution
Chi-Squared Test of Independence – Testing association between two categorical variables via observed vs. expected counts under independence
Cochran-Mantel-Haenszel Test – Stratified analysis of 2x2 tables across levels of a confounding variable
Effect Sizes: Overview – Why effect sizes matter and which measures apply to which tests
Equivalence Testing with TOST – Two one-sided tests for establishing practical equivalence within a margin
Fisher’s Exact Test – Exact test for 2x2 (and r x c) contingency tables, based on the hypergeometric distribution
Friedman Test – Non-parametric test for three or more paired / repeated measures
Holm’s Step-Down Correction – Sequential Bonferroni with uniformly greater power; controls FWER
Kendall’s Tau – Rank correlation based on concordant vs discordant pairs; robust to ties and small samples
Kolmogorov-Smirnov Test – CDF-based goodness-of-fit for one-sample and two-sample comparisons
Kruskal-Wallis Test – Non-parametric one-way ANOVA based on ranks across three or more groups
Levene’s Test of Variances – Robust test of variance homogeneity across groups
Mann-Whitney U Test – Non-parametric two-sample test based on ranks; alternative to the independent t-test
McNemar’s Test – Paired / matched binary data: comparing two measurements on the same units, using discordant pairs only
Mixed ANOVA – Designs combining between-subjects and within-subjects factors, with the interaction as key
Multiple Comparisons: Overview – Family-wise error vs false discovery rate and when each applies
Null and Alternative Hypotheses – Formulating H0 and H1, choosing one- vs two-sided tests, and avoiding post-hoc reformulation
One-Proportion Test – Testing whether an observed proportion differs from a pre-specified reference; exact and score-based methods
One-Sample t-Test – Testing whether a sample mean equals a pre-specified reference value
One-Sample z-Test – Comparing a sample mean to a reference when the population variance is known – rare in practice but pedagogically useful
One-Way ANOVA – Comparing means across three or more independent groups via the F-test
P-Values Explained – Definition, correct interpretation, and the most common misinterpretations of the p-value
Paired t-Test – Comparing two dependent measurements by applying a one-sample t-test to their differences
Pearson Correlation Test – Testing whether the Pearson correlation between two continuous variables is non-zero
Permutation Tests – Exact or Monte Carlo p-values via resampling under exchangeability
Post-Hoc Tests with Tukey HSD – Pairwise comparisons after ANOVA with family-wise error control
Repeated-Measures ANOVA – Within-subjects analysis of variance with sphericity checks and corrections
Shapiro-Wilk Normality Test – The most powerful commonly used test for normality in small to moderate samples
Spearman Rank Correlation Test – Non-parametric correlation test for monotonic association between two variables
The Bootstrap: Introduction – Resampling with replacement to estimate standard errors and sampling distributions
The Jackknife – Leave-one-out resampling for bias and variance estimation
The Runs Test – Testing randomness of a binary or dichotomised sequence via run lengths
The Sign Test – Testing a median or paired difference using only the direction of each observation
Two-Proportion Test – Comparing two independent proportions; z-test and chi-squared equivalence, with risk difference and relative risk
Two-Sample t-Test – Comparing means between two independent groups with Student’s and Welch’s t-tests, including assumption checks and effect sizes
Two-Way ANOVA – Two between-subjects factors: main effects and interaction
Type I and Type II Errors – Rejecting a true null (alpha) and failing to reject a false null (beta); power and the trade-off
Welch’s t-Test – Two-sample t-test that does not assume equal variances; R’s default
Wilcoxon Signed-Rank Test – Non-parametric paired-sample test based on signed ranks

Sample Size & Power

Effect Size: Cohen’s d – Standardised mean difference: definition, variants, interpretation
Effect Size: Cohen’s h – Standardised effect size for comparisons between proportions via arcsine transformation
Effect Size: Eta-Squared – Proportion of variance explained in ANOVA and its partial, generalised, and omega variants
Minimum Detectable Effect – Solving for the smallest effect that can be detected at a given sample size and power
Post-Hoc Power: A Controversy – Why computing power from the observed effect is circular and uninformative
Power Analysis: Introduction – The four quantities of power analysis and how they trade against each other in study design
Power for Agreement (Kappa) – Sample size for Cohen’s kappa as a precision-targeted or hypothesis-test planning quantity
Power for Bland-Altman Studies – Sample size for estimating limits of agreement with adequate precision
Power for Chi-Squared Tests – Sample size for goodness-of-fit and contingency chi-squared using Cohen’s w
Power for Cluster-RCT – Sample size for cluster-randomised trials: design effect from ICC and cluster size
Power for Correlation Tests – Sample size for Pearson and Spearman correlations via Fisher’s z transformation
Power for Cox Regression – Events-driven sample size: the number of events, not subjects, drives power
Power for Crossover Trials – Sample size for 2x2 crossover designs, exploiting within-subject comparison
Power for Diagnostic Accuracy – Sample size for estimating sensitivity and specificity with adequate precision
Power for Equivalence (TOST) – Sample size for two-one-sided-tests equivalence testing within a pre-specified margin
Power for ICC – Sample size for the intraclass correlation coefficient in reliability studies
Power for Linear Regression – Sample size for overall regression F and for individual coefficients via Cohen’s f^2
Power for Logistic Regression – Sample size for logistic regression: events per variable, odds-ratio detection, and simulation
Power for McNemar’s Test – Sample size for paired binary comparisons driven by the rate of discordant pairs
Power for Non-Inferiority Trials – One-sided equivalence: is the new treatment no worse than the reference by more than the margin
Power for One-Proportion Test – Sample size for testing a single proportion against a reference, with normal and exact methods
Power for One-Sample t-Test – Computing sample size and power for a test of one mean against a reference
Power for One-Way ANOVA – Sample size for detecting a difference among three or more group means
Power for Paired t-Test – Sample size for paired designs, taking advantage of within-subject correlation
Power for Repeated-Measures ANOVA – Sample size for within-subjects designs with within-subject correlation
Power for Stepped-Wedge Trials – Variance of the treatment effect in stepped-wedge cluster designs via the Hussey-Hughes formula
Power for Two-Proportion Test – Sample size for comparing proportions between two independent groups
Power for the Log-Rank Test – Event-based sample size for Kaplan-Meier comparisons via the log-rank test
Sample Size Sensitivity Analysis – Reporting sample size across a range of assumed effect sizes, SDs, and dropout rates
Sample Size for a Two-Sample t-Test – Power analysis and sample size calculation for comparing two independent group means, with worked examples in R

Data Visualisation

Aesthetics and Geoms – The two ggplot2 building blocks: what data to map (aesthetics) and how to draw it (geoms)
Annotations and Labels – Adding text, arrows, rectangles, and repelling labels to ggplot2 figures
Bar Charts – Counts per category and summarised values per category
Bland-Altman Plots – Mean-vs-difference plot for comparing two measurement methods
Boxplots – Five-number summary display: Tukey boxplots with whiskers and outlier rules
Bubble Plots – Scatter plots with a third variable encoded by point size
Colour Palettes – Discrete, continuous, perceptually uniform, and diverging palettes for ggplot2
Colour-Blind-Safe Plots – Designing plots that remain readable to viewers with colour-vision deficiencies
Contour Plots – Isolines of a 2D scalar field or density
Correlation Heatmaps – Visualising a correlation matrix with colour-encoded cells and significance markers
Density Plots – Smooth distribution display via kernel density estimation
Dot Plots – Cleveland dot plots and Wilkinson dot plots for compact distributional displays
Facets and Panels – Creating small multiples via facet_wrap and facet_grid
Forest Plots (Visualisation) – Point estimates and confidence intervals stacked across studies or subgroups
Funnel Plots (Visualisation) – Study effect vs. precision plot used to detect publication bias in meta-analysis
Heatmaps – Matrix displays with colour-encoded cell values, optionally with row/column ordering
Hexbin Plots – Binning the 2D plane into hexagons and colouring by count, for large bivariate data
Histograms – Univariate distribution display via binned frequencies or densities
Interactive Plots with ggiraph – Interactive SVG-based ggplot extensions with per-element hover, click, and selection
Interactive Plots with plotly – Converting ggplot objects to interactive HTML plots via ggplotly()
Line Plots – Connecting ordered observations with lines for time-series and trajectory displays
Pairs Plots – Scatterplot matrices for exploring pairwise relationships among several continuous variables
Patchwork: Multi-Plot Composition – Composing multiple ggplot objects into a single figure with the patchwork package
ROC Curves – Sensitivity vs 1-specificity plots for diagnostic tests and binary classifiers
Raincloud Plots – Half-violin + jittered raw points + boxplot: a comprehensive distribution display
Ridge Plots – Stacked density curves across groups, also called ‘joy plots’
Saving and Exporting Figures – Exporting ggplot figures to PDF, PNG, SVG, and TIFF with the right dimensions and DPI
Scales and Coordinates – Controlling how data values are mapped to visual values (scales) and how the plot area is organised (coordinates)
Scatter Plots – The bivariate continuous default: geom_point with overplotting strategies and trend lines
Stacked and Dodged Bars – Two-factor bar charts: stacking for composition, dodging for side-by-side comparison
Survival Curves – Publication-quality Kaplan-Meier plots with risk tables and log-rank annotation
The Grammar of Graphics – Understanding ggplot2 as a layered grammar: data, mappings, geoms, stats, scales, coordinates, facets, and themes
Time Series Plots – Dated line plots with forecast bands and decomposition displays
Violin Plots – Symmetric kernel density plots for comparing distributions across groups
ggplot2 Themes – Visual styling of plots via complete themes and fine-grained theme() elements

Regression & Modelling

Added-Variable Plots – Partial regression plots showing each predictor’s unique contribution
Best Subset Selection – Enumerate all candidate predictor subsets to find the best by a chosen criterion
Beta Regression – Regression for continuous outcomes in (0, 1): proportions, percentages, fractions
Centring and Scaling Predictors – Why and how to centre and scale continuous predictors before regression
Contrasts in R – Built-in and custom contrast matrices for factor variables in regression and ANOVA
Cook’s Distance – Single-number influence measure combining leverage and residual
Cox Regression as Regression – The Cox proportional-hazards model in the GLM-family perspective; pointer to the Survival section
Dirichlet Regression – Regression for compositional outcomes: vectors of proportions summing to 1
Dummy Coding – Treatment-contrast coding for categorical predictors: reference category and indicator variables
Effect (Deviation) Coding – Sum-to-zero contrasts: coefficients as deviations from the grand mean
Elastic Net – Combined L1 and L2 penalty: variable selection plus grouping behaviour
GAMMs: Additive Mixed Models – Combining smooth effects (GAM) with random effects (LMM) via mgcv::gam
GAMs: Introduction – Generalised additive models with smoothing splines for non-linear predictor effects
Generalised Estimating Equations – Marginal regression models for clustered data with working correlation structures
Hurdle Models – Two-part count models: binary decision for zero vs. non-zero, plus truncated count for positives
Interactions in Regression – Product terms in regression, centring, and the interpretation of conditional effects
Lasso Regression – L1-penalised regression for simultaneous shrinkage and variable selection
Leverage and Influence – Hat values, DFBETAs, and Cook’s distance for identifying influential observations
Linear Regression Assumptions – The four classical assumptions of linear regression and how to check each
Logistic Regression – Regression for binary outcomes: odds ratios, logit link, and MLE fitting
Logistic Regression Diagnostics – Calibration plots, Hosmer-Lemeshow, deviance residuals, and influence for GLM
Mixed-Effects Models: Introduction – Hierarchical / multilevel regression: fixed effects for population-level structure, random effects for cluster-level variability
Model Selection with AIC and BIC – Information criteria for comparing models: Akaike’s and Bayesian
Multicollinearity and VIF – Variance inflation factors, condition numbers, and remedies for collinear predictors
Multinomial Logistic Regression – Regression for unordered categorical outcomes with three or more levels
Multiple Linear Regression – Linear regression with two or more predictors: coefficient interpretation, partial effects, and collinearity
Negative Binomial Regression – Count regression with explicit variance-dispersion parameter for overdispersed data
Nested vs Crossed Random Effects – Distinguishing hierarchical from partially-overlapping grouping structures in mixed models
Offsets in Regression – Treating exposure time or person-years as a known component of the linear predictor
Ordinal Logistic Regression – Regression for ordered categorical outcomes using the proportional-odds model
Partial Correlation – Correlation between two variables after removing the linear influence of others
Poisson Regression – Log-linear count regression with rates and offsets
Polynomial Regression – Extending linear regression with powers of predictors: orthogonal polynomials and overfitting risk
Probit Regression – Binary regression using the cumulative normal link
Quantile Regression – Modelling conditional quantiles (median, upper/lower percentiles) rather than the mean
Quasi-Poisson Regression – Poisson regression with a dispersion parameter for moderate overdispersion
R-Squared and Adjusted R-Squared – Proportion of variance explained, and the penalty for model complexity
ROC and AUC for Logistic Models – Discrimination quality of logistic regression via receiver operating characteristic curves
Random-Intercept Models – Mixed model with cluster-specific intercepts but common slopes
Random-Slope Models – Mixed model with cluster-specific slopes, allowing effects to vary across groups
Residual Diagnostics – Four standard residual plots, what they show, and how to read them
Ridge Regression – L2 penalty for coefficient shrinkage; stable fits under collinearity
Robust Regression – M-estimators and MM-estimators for regression resistant to outliers and heavy-tailed errors
Simple Linear Regression – Fitting, diagnosing, and interpreting a linear regression with a single continuous predictor, from first principles
Spline Regression – Piecewise-polynomial regression with natural cubic splines for flexible non-linear fits
Stepwise Regression: Pitfalls – Why automated predictor selection by p-value or AIC has serious statistical problems
Tobit Regression – Linear regression for outcomes censored at a known threshold
Truncated Regression – Regression when observations are missing (not just censored) below or above a threshold
Worked lme4::lmer Examples – Common formula patterns for lmer with interpretation of each output block
Zero-Inflated Models – Mixture models for count data with more zeros than Poisson or negative binomial predicts
glmer: Generalised Mixed Models – Extending mixed models to binary, count, and other GLM outcomes

Multivariate Methods

Bray-Curtis Dissimilarity – The standard ecological distance metric for species-abundance community data
Canonical Correlation Analysis – Identifying pairs of linear combinations that maximise correlation between two variable sets
Cluster Validation: Silhouette – Measuring how well each observation fits its assigned cluster relative to neighbouring clusters
Confirmatory Factor Analysis – Testing a pre-specified factor structure against data, with chi-squared and incremental fit indices
Correspondence Analysis – Low-dimensional biplot representation of contingency tables via chi-squared distance
DBSCAN Clustering – Density-based clustering with automatic noise detection and arbitrary cluster shapes
Distance and Dissimilarity Measures – Euclidean, Manhattan, Minkowski, Canberra, and their use in clustering / MDS
Exploratory Factor Analysis – Identifying latent factors from correlated manifest variables, with extraction, rotation, and number-of-factors decisions
Factor Rotation – Orthogonal and oblique rotations to make factor loadings interpretable
Gaussian Mixture Models – Model-based clustering via mixtures of multivariate normals, fitted by EM
Hierarchical Clustering – Agglomerative (bottom-up) clustering with various linkage criteria and dendrogram display
Independent Component Analysis – Signal separation: recovering independent latent sources from linear mixtures
Jaccard and Dice Dissimilarity – Distance measures for binary / presence-absence data
Kernel PCA – Non-linear dimensionality reduction via PCA in an implicit feature space
Linear Discriminant Analysis – Classifier assuming multivariate normal within-class distributions with equal covariance
MANOVA – Multivariate analysis of variance: testing group differences across multiple correlated outcomes jointly
Mahalanobis Distance – Scale- and correlation-aware distance for multivariate data
Metric Multidimensional Scaling – Representing a distance matrix in low-dimensional Euclidean coordinates
Multiple Correspondence Analysis – Extension of correspondence analysis to more than two categorical variables
Non-Metric Multidimensional Scaling – MDS using only the ranks of distances, minimising Kruskal stress
PLS Discriminant Analysis – PLS extended to classification via dummy-coded class indicators
Partial Least Squares Regression – Regression using latent components optimised for outcome prediction in high dimensions
Partitioning Around Medoids – PAM: robust k-medoids clustering for non-Euclidean distances and outlier-contaminated data
Principal Component Analysis – Dimensionality reduction by variance-maximising projection, with interpretation, scaling, and visualisation in R
Quadratic Discriminant Analysis – LDA with class-specific covariance matrices; quadratic decision boundaries
Structural Equation Models – Combining measurement (CFA) and structural (path) components for latent-variable regression
The Elbow Method – Choosing the number of clusters by identifying a bend in the within-cluster sum of squares curve
The Gap Statistic – Choosing the number of clusters via comparison with a null (uniform) distribution
UMAP and t-SNE: Overview – Modern non-linear dimensionality reduction for visualisation: UMAP, t-SNE, and their caveats
k-Means Clustering – Partitioning observations into k clusters by minimising within-cluster sum of squares

Time-Series Analysis

ARIMA Models – Autoregressive integrated moving average models for forecasting stationary and trending time series
Augmented Dickey-Fuller Test – Testing for unit roots / non-stationarity against a stationary alternative
Bayesian Changepoint Detection – Posterior probability of a changepoint at each time, with online and offline variants
Changepoint Detection – Identifying abrupt shifts in mean, variance, or slope of a time series
Diebold-Mariano Test – Testing equality of forecast accuracy between two competing models
Differencing a Time Series – Removing trend via first differences; seasonal differencing for cyclic patterns
ETS Models – State-space formulation of exponential smoothing: Error, Trend, Seasonal
Exponential Smoothing – Recursive weighted-average smoothers: simple, Holt (trend), Holt-Winters (seasonal)
Forecast Accuracy Metrics – MAE, RMSE, MAPE, MASE: scaled and scale-dependent measures of forecast error
GARCH Models – Conditional heteroscedasticity models: volatility clustering in financial and other series
Granger Causality – Testing whether one time series improves prediction of another beyond its own history
Holt-Winters Method – Triple exponential smoothing with level, trend, and seasonal components
Interpreting ACF and PACF – Identifying AR and MA orders from autocorrelation and partial autocorrelation functions
KPSS Test – Testing for stationarity (null) against a unit-root alternative
Moving Averages – Smoothing a time series by averaging over a sliding window
Phillips-Perron Test – Non-parametric adjustment of the Dickey-Fuller test for autocorrelation and heteroscedasticity
Rolling-Origin Cross-Validation – Time-series-aware cross-validation that respects temporal ordering
Seasonal ARIMA (SARIMA) – ARIMA extended with seasonal autoregressive, moving-average, and differencing terms
Seasonal Decomposition (STL) – Loess-based seasonal-trend decomposition robust to outliers
Spectral Analysis – Decomposing a time series into frequency components via periodogram and spectral density
State-Space Models – Latent-state time-series framework unifying ARIMA, exponential smoothing, and many others
Stationarity Tests – Testing whether a time series has constant mean and variance over time
The Kalman Filter – Recursive optimal estimation for linear-Gaussian state-space models
The Ljung-Box Test – Portmanteau test for remaining autocorrelation in residuals across multiple lags
Time Series: Introduction – Trend, seasonality, cyclicality, and noise: the four components of a time series
VECM and Cointegration – Vector error correction models for cointegrated series; Johansen trace and eigenvalue tests
Vector Autoregression (VAR) – Multivariate time series where each variable depends on lags of all variables
Wavelet Analysis – Time-frequency decomposition capturing localised periodicities that change over time
White Noise Tests – Testing a residual series for independence and constant variance
X-13ARIMA-SEATS Decomposition – The seasonal adjustment tool used by US Census Bureau and statistical agencies

Bayesian Statistics

Bayes Factors – Ratio of marginal likelihoods under two models, the Bayesian analogue of a likelihood-ratio test
Bayes’ Theorem for Parameters – Using Bayes’ theorem to update beliefs about parameters given data
Bayesian ANOVA – Posterior inference on group differences with credible intervals and contrast comparisons
Bayesian Hierarchical Models – Partial pooling through prior structure for grouped / multilevel data
Bayesian Hypothesis Testing – Region of practical equivalence (ROPE) and Bayes factors for Bayesian decisions
Bayesian Linear Regression – Linear regression with priors on coefficients, fit via MCMC in brms or rstanarm
Bayesian Logistic Regression – Logistic regression with MCMC-based posterior for binary outcomes
Bayesian Mediation Analysis – Testing indirect effects with posterior samples
Bayesian Meta-Analysis – Hierarchical meta-analysis with explicit prior on between-study heterogeneity
Bayesian Mixed Models – Multilevel regression with random effects, fit via Bayesian MCMC
Conjugate Priors – When the prior and posterior share the same distributional family: the beta-binomial, normal-normal, and gamma-Poisson models
Credible Intervals – Bayesian analogue of confidence intervals: equal-tailed and highest-posterior-density regions
Divergences and Pairs Plots – Diagnosing HMC sampling problems via divergent transitions and pair-plot visualisations
Gibbs Sampling – MCMC via sequential sampling from conditional distributions
Hamiltonian Monte Carlo – Gradient-based MCMC using Hamiltonian dynamics; NUTS variant in Stan
Highest Posterior Density Interval – Shortest interval containing a specified posterior probability mass
Informative Priors – Priors derived from previous studies, expert elicitation, or mechanistic knowledge
Jeffreys’ Prior – Invariant reference prior proportional to the square root of the Fisher information
Leave-One-Out Cross-Validation – Efficient Bayesian LOO-CV via Pareto-smoothed importance sampling (PSIS-LOO)
Metropolis-Hastings – The foundational MCMC algorithm with proposal distribution and accept/reject rule
Posterior Mean and Variance – Point estimates and uncertainty summaries from a posterior distribution
Posterior Predictive Checks – Comparing simulated data from the fitted model to observed data for model criticism
Prior Specification – Choosing priors: informative, weakly informative, flat, and the risks of each
R-hat and ESS Diagnostics – Convergence (R-hat) and effective sample size (ESS) for MCMC quality control
Stan: Introduction – Writing probabilistic models in Stan’s data-parameters-model blocks
The Beta-Binomial Model – Conjugate Bayesian inference for a binomial proportion with a Beta prior
The Gamma-Poisson Model – Conjugate Bayesian inference for a Poisson rate with Gamma prior
The Normal-Normal Model – Conjugate Bayesian inference for a normal mean with known variance
The Posterior Predictive Distribution – Distribution of a new observation integrating over posterior uncertainty about parameters
The Savage-Dickey Density Ratio – Bayes factor for a point null via posterior/prior density ratio
WAIC – Widely applicable information criterion: Bayesian out-of-sample prediction score
Weakly Informative Priors – Priors that regularise without imposing substantive constraints
brms Basics – High-level R interface to Stan: formula syntax, families, prior specification
rstanarm – Pre-compiled Stan models with a frequentist-like R interface
tidybayes Workflow – Extract, manipulate, and visualise posterior draws in tidyverse style

Survival Analysis

Accelerated Failure Time Models – Parametric survival regression on the log-time scale with acceleration-factor interpretation
Censoring Types – Right, left, interval, and informative censoring in time-to-event data
Checking Cox Assumptions – Diagnostics for proportional hazards, functional form, and influential observations
Competing Risks: Cumulative Incidence – Estimating the probability of each event type in the presence of competing risks
Concordance and the C-Index – Discrimination ability of a survival model measured by concordance probability
Conditional Survival – Updating prognosis for survivors: P(T > t + s | T > s)
Cox Proportional Hazards Regression – Semi-parametric regression for survival data with arbitrary baseline hazard
Fine-Gray Subdistribution Hazards – Regression directly targeting cumulative incidence in competing-risks data
Frailty Models – Random effects in survival: unobserved heterogeneity and clustered survival data
Hazard, Survival, and Cumulative Hazard – The three equivalent representations of a time-to-event distribution
Interval-Censored Data – Events known to occur within a time interval, not at an exact time
Joint Longitudinal-Survival Models – Simultaneously modelling a longitudinal biomarker and time-to-event outcome
Kaplan-Meier Estimation – Non-parametric estimation of the survival function from right-censored data, with confidence intervals and group comparisons
Landmark Analysis – Conditioning on survival to a landmark time to analyse time-varying covariates
Left Truncation – Delayed entry: subjects only enter the study after surviving to a certain age or time
Log-Rank Test: Details – Weighted and stratified log-rank test for comparing survival across groups
Multi-State Models – Modelling transitions between discrete states over time
Parametric Survival: Log-Normal – Log-normal AFT model for non-monotone hazards
Parametric Survival: Weibull – Fitting Weibull regression in both PH and AFT parameterisations
Recurrent Events – Modelling repeated events per subject: Andersen-Gill, PWP, and frailty approaches
Restricted Mean Survival Time (RMST) – Alternative summary robust to non-proportional hazards and censoring
Royston-Parmar Flexible Parametric Models – Splines on the log-cumulative-hazard for flexible baseline without leaving parametric modelling
Schoenfeld Residuals – Residuals that test the proportional-hazards assumption in Cox regression
Simulating Survival Data – Generating realistic censored time-to-event datasets for power analysis and method validation
Stratified Cox Models – Stratum-specific baseline hazards when proportional hazards fails across groups
The Brier Score for Survival – Time-dependent prediction loss combining calibration and discrimination
The Nelson-Aalen Estimator – Non-parametric estimator of the cumulative hazard under right-censoring
Time-Dependent ROC – ROC curves and AUC evaluated at specific follow-up times for survival predictions
Time-Varying Covariates – Covariates whose value changes during follow-up: counting-process data layout
Weighted Log-Rank Tests – Peto-Peto, Fleming-Harrington, and other weighted variants for non-proportional hazards

Bioinformatics

ATAC-Seq Analysis – Chromatin accessibility profiling via ATAC-seq peak calling and differential analysis
Alignment with BWA and Bowtie – Fast read alignment to a reference genome via Burrows-Wheeler Transform indexing
Batch Correction with ComBat – Removing known batch effects from expression data using empirical-Bayes methods
Bulk RNA-seq Differential Expression with DESeq2 – A complete DESeq2 workflow: from count matrix through normalisation, dispersion estimation, Wald testing, and LFC shrinkage
Cell-Type Annotation – Assigning cell types to clusters using reference-based and manual methods
ChIP-Seq Analysis – TF binding and histone modification profiling from ChIP-seq data
Copy Number Variation Analysis – Inferring genomic copy-number changes from sequencing or array data
Counting Reads with featureCounts – Assigning reads to genes or genomic features from a BAM file
DNA Methylation Analysis – Differentially methylated positions and regions from bisulfite or array data
Differential Expression with edgeR – NB GLM-based differential expression with exactTest and quasi-likelihood F-test
Differential Expression with limma-voom – Transforming counts for linear modelling with precision weights
Drug-Target Interaction Mining – Integrating bioassay databases for drug-target identification
FASTQ Quality Control – Inspecting sequencing read quality via Phred scores and FastQC-style reports
Finding Marker Genes – Identifying cluster-specific genes for cell-type annotation
GSEA Preranked Analysis – Enrichment of gene sets in a ranked list without a significance cutoff
GSVA Single-Sample Enrichment – Per-sample pathway scores for downstream sample-level analysis
Gene Annotation with biomaRt – Programmatic queries against Ensembl for IDs, coordinates, and annotations
Gene Ontology Enrichment – Over-representation of GO terms in a differentially-expressed gene set
Heatmaps for RNA-seq – Visualising expression patterns across genes and samples with clustering
Integration with Harmony – Removing batch effects in scRNA-seq while preserving biological variation
KEGG Pathway Enrichment – Over-representation and visualisation against KEGG metabolic/signalling pathways
MA Plots – Mean-average plots for differential expression diagnostics
Metagenomic Profiling – Whole-metagenome taxonomic and functional profiling with MetaPhlAn and Kraken
Microbiome Analysis with DADA2 – Amplicon sequence variant (ASV) inference from 16S/ITS amplicon data
Microbiome Diversity Metrics – Alpha and beta diversity measures for community comparison
Multi-Omics Integration – Joint analysis of transcriptomic, genomic, and epigenomic layers
Multiple Sequence Alignment – Aligning multiple sequences via ClustalW, Muscle, or T-Coffee from R
PCA of RNA-seq Samples – Using PCA on variance-stabilised counts to check sample structure and batches
Phylogenetic Trees with ape – Building and interpreting phylogenetic trees using distance-based and maximum-likelihood methods
Population Genetics Basics – Allele frequencies, Hardy-Weinberg equilibrium, and F-statistics
Protein Structure Prediction – From sequence to 3-D structure with AlphaFold and RoseTTAFold
Proteomics with MSstats – Differential protein abundance from label-free or labelled mass-spectrometry data
Pseudoalignment with Salmon and Kallisto – Fast transcript quantification without full alignment
RNA-seq Normalisation – TMM, RLE, upper-quartile, and CPM: making samples comparable
Read Trimming and Adapter Removal – Removing low-quality bases and sequencing adapters before alignment
STAR: Spliced Alignment – Fast splice-aware RNA-seq aligner capable of detecting novel junctions
Sequence Alignment: Overview – Global vs local alignment, scoring matrices, and the core algorithms
Single-Cell with Seurat – End-to-end workflow for single-cell RNA-seq analysis in Seurat
Spatial Transcriptomics – Spatially resolved expression analysis with Visium and related platforms
Structural Variant Detection – Detecting deletions, duplications, inversions, and translocations from WGS
Surrogate Variable Analysis (SVA) – Detecting and adjusting for unknown sources of heterogeneity in expression data
Trajectory Analysis – Inferring developmental and activation trajectories from scRNA-seq
Transcript-to-Gene Summarisation – Aggregating transcript-level estimates to gene level for standard differential expression
VCF Manipulation – Reading, filtering, and subsetting VCF files in R and on the command line
Variant Annotation with VEP – Predicting functional consequences of variants using Ensembl VEP
Variant Calling with GATK – Short-variant detection from aligned BAMs via the GATK Best Practices pipeline
Volcano Plots – Visualising significance versus effect size across genes
scRNA Clustering – Graph-based clustering of cells via Louvain or Leiden community detection
scRNA Normalisation – Library-size normalisation and variance stabilisation for scRNA-seq
scRNA QC and Filtering – Removing low-quality cells before downstream scRNA-seq analysis

Machine Learning

A Complete mlr3 Workflow – End-to-end modelling with mlr3: tasks, learners, resamplings, and tuning
A Complete tidymodels Workflow – End-to-end modelling in tidymodels: recipes, parsnip, workflows, and tuning
Anomaly Detection – Unsupervised methods for flagging rare, unusual observations
Bagging – Bootstrap aggregation: variance reduction through averaging
Calibration Plots – Reliability diagrams: do predicted probabilities match observed frequencies?
CatBoost – Gradient boosting with native categorical handling and ordered target encoding
Class Weights for Imbalanced Data – Loss-based weighting as an alternative to resampling
Convolutional Neural Networks: Introduction – Spatial filters and pooling for image and signal learning
Cross-Validation – Estimating out-of-sample predictive performance honestly with k-fold, repeated, nested, and leave-one-out cross-validation
DBSCAN as an ML Tool – Density-based clustering and outlier detection in one step
Decision Trees (CART) – Recursive binary partitioning with Gini or variance splits
Dropout and Early Stopping – Two of the most effective neural-network regularisation techniques
Extremely Randomised Trees – Extra Trees: random thresholds for faster and more regularised ensembles
Feature Engineering – Transformations, interactions, and encodings to expose signal to models
Feature Selection – Filter, wrapper, and embedded strategies to reduce the feature space
Feedforward Networks in torch – Building and training deep feedforward networks in R with the torch package
Gradient Boosting – Stagewise additive modelling with gradient descent on loss
Isolation Forest – Tree-based unsupervised anomaly detection via path length
Isotonic Regression Calibration – Non-parametric monotone calibration via pool-adjacent-violators
Kernel SVMs – Non-linear classification via the kernel trick with RBF and polynomial kernels
LIME Explanations – Local Interpretable Model-Agnostic Explanations via surrogate linear models
LightGBM – Histogram-based leaf-wise gradient boosting for large datasets
Linear Discriminant as ML – LDA as a classifier with shared Gaussian class-conditional covariance
Linear Support Vector Machines – Large-margin classifiers with soft-margin slack and the C parameter
Logistic Regression as ML – Linear classifier with log-loss and L1/L2 regularisation
Naive Bayes – Probabilistic classification under conditional independence
Nested Cross-Validation – Honest model evaluation with separate loops for tuning and assessment
Neural Networks: Introduction – From perceptrons to multi-layer networks: weights, layers, and activation functions
Partial Dependence Plots – Marginal effect of a feature on model predictions
Platt Scaling – Logistic recalibration: mapping raw scores to calibrated probabilities
Preprocessing with recipes – Leak-free feature engineering in tidymodels with recipes
RNNs and LSTMs: Introduction – Recurrent networks for sequence modelling with gated memory
Random Forests – Bagging decision trees with random feature subsets for robust ensembles
Regularisation in ML – L2 ridge, L1 lasso, and elastic net for controlling model complexity
SHAP Values – Game-theoretic local feature attributions via Shapley values
SMOTE for Imbalanced Classes – Synthetic Minority Oversampling Technique for class-imbalanced classification
Stacking Ensembles – Combining heterogeneous base models via a meta-learner
Supervised Learning: Overview – Framework, loss functions, generalisation, and the bias-variance decomposition
The Bias-Variance Tradeoff – Decomposition of prediction error into structural and sampling components
Train-Test Splits – Holdout evaluation, stratification, and honest generalisation estimates
Transformers: Overview – Self-attention architectures underlying modern NLP and beyond
Variable Importance – Impurity-based and permutation importance for tree ensembles and beyond
XGBoost – Regularised gradient boosting with sparsity awareness and early stopping
k-Means as an ML Tool – Using k-means for feature engineering, quantisation, and pre-segmentation
k-Nearest Neighbours Classification – Instance-based learning with majority voting among nearest neighbours
k-Nearest Neighbours Regression – Local averaging for non-parametric regression

Clinical Biostatistics

Adaptive Trial Designs – Pre-specified modifications to trial conduct based on interim data
Alpha-Spending Functions – Flexible interim alpha allocation via Lan-DeMets spending functions
Baseline Adjustment with ANCOVA – Adjusting for baseline covariates to improve precision and address regression to the mean
Bland-Altman Limits of Agreement – Graphical comparison of two measurement methods on the same subjects
Blinding Procedures – Masking participants, investigators, assessors, and analysts
Block Randomisation – Random permutations within fixed blocks for balanced allocation
Clinical Equivalence Trials – Two one-sided tests (TOST) for bioequivalence and clinical equivalence
Cluster-Randomised Trials – Randomisation of clusters (clinics, schools) rather than individuals
Cohen’s Kappa – Chance-corrected agreement between two raters on categorical data
Conditional Power – Probability of eventual trial success given interim data
Crossover RCT Design – Within-subject comparison across treatment periods with washout
Cutpoint Selection – Youden’s J, cost-based, and closest-to-(0,1) criteria for diagnostic thresholds
Diagnostic Test Accuracy – Sensitivity, specificity, predictive values, likelihood ratios, and ROC analysis for binary diagnostic tests
Factorial Trials – Simultaneously evaluating multiple interventions via a factorial design
ITT vs Per-Protocol Analysis – Primary analyses under the intent-to-treat and per-protocol principles
Interim Analyses and Group Sequential Designs – Pre-planned interim looks with alpha-spending and early-stopping boundaries
Intraclass Correlation Coefficient (ICC) – Absolute agreement and consistency for continuous inter-rater reliability
Likelihood Ratios – LR+ and LR- as prevalence-independent summaries of diagnostic performance
Minimisation Algorithm – Covariate-adaptive allocation that prospectively minimises imbalance
Missing Data in RCTs – MCAR, MAR, MNAR and their implications for primary analysis
Multiple Imputation – Imputing multiple plausible values and combining via Rubin’s rules
Non-Inferiority Margin Selection – Defining the clinically acceptable maximum inferiority
O’Brien-Fleming Boundary – Conservative early stopping boundary in group-sequential trials
Parallel-Group RCT Design – The two-arm randomised controlled trial: structure, analysis, and reporting
Pocock Boundary – Constant nominal alpha across interim analyses
Predictive Values and Prevalence – How disease prevalence determines PPV and NPV via Bayes’ theorem
ROC Analysis – Receiver Operating Characteristic curves and area under the curve
Randomisation Methods – Simple, block, and stratified randomisation for RCT allocation
Reliability and Cronbach’s Alpha – Internal consistency of multi-item scales
Sample Size Re-Estimation – Updating trial sample size mid-study using blinded or unblinded nuisance parameter estimates
Sensitivity Analyses in Clinical Trials – Exploring robustness of conclusions to assumptions about missing data and model choice
Stepped-Wedge Trial – Sequential rollout cluster design with all clusters eventually treated
Stratified Randomisation – Separate randomisation lists within levels of baseline covariates
Subgroup Analyses – Pre-specified subgroup effects and treatment-by-subgroup interaction tests
Subgroup Forest Plots – Visual summary of subgroup-specific effects and interaction tests
Weighted Kappa – Ordinal inter-rater agreement with linear or quadratic weights

Meta-Analysis

Bivariate SROC Curves – Hierarchical summary ROC modelling for diagnostic-accuracy meta-analysis
Converting Between Effect Sizes – Moving between standardised mean differences, correlations, and odds ratios
Cumulative Meta-Analysis – Sequentially pooling studies in the order they were published
Egger’s Test for Funnel Asymmetry – Formal regression test for small-study effects and publication bias
Fixed-Effect vs. Random-Effects Meta-Analysis – Two ways to pool study-level effect sizes, and the consequences of each for inference and interpretation
Forest Plots in Meta-Analysis – Standard visualisation of per-study effects, weights, and the pooled estimate
Funnel Plots – Visual inspection for small-study effects and publication bias
Individual Participant Data Meta-Analysis – Two-stage vs one-stage analysis of raw data pooled across studies
Leave-One-Out Meta-Analysis – Sensitivity of the pooled estimate to each individual study
Meta-Analysis of Diagnostic Accuracy – Bivariate and HSROC models for pooling diagnostic test performance
Meta-Regression – Explaining between-study heterogeneity with study-level covariates
NMA Consistency and Inconsistency – Loop-specific and global tests for direct-indirect evidence agreement
NMA League Tables – Pairwise treatment contrasts in a compact triangular grid
Network Meta-Analysis: Introduction – Combining direct and indirect comparisons across multiple treatments
Pooling Correlation Coefficients – Fisher z transformation for meta-analysis of correlations
Pooling Log-Odds Ratios – Fixed and random-effects meta-analysis of binary outcomes on the log scale
Pooling Risk Ratios – Meta-analysis of binary outcomes on the risk-ratio scale
Prediction Intervals in Meta-Analysis – The plausible range of a future study’s true effect under random-effects
Quantifying Heterogeneity: Q and I^2 – Cochran’s Q, I-squared, and their interpretation in meta-analysis
SUCRA Rankings – Surface Under the Cumulative Ranking curve for treatment comparisons
Selection Models for Publication Bias – Weighted-distribution models for adjusting meta-analysis for selection
Standardised Mean Difference: Hedges’ g – Small-sample-corrected standardised effect size for continuous outcomes
Subgroup Meta-Analysis – Pooling within strata and testing for between-subgroup heterogeneity
Tau-Squared Estimation – Estimators of between-study variance: DL, REML, PM, SJ, EB
Trim-and-Fill – Adjusting the pooled estimate for apparent funnel asymmetry

Experimental Design

2^k Factorial Designs – Full factorial experiments with k two-level factors
3^k Factorial Designs – Full factorial experiments with k three-level factors
Balanced Incomplete Block Designs – BIBDs when block size is smaller than treatment count
Box-Behnken Designs – Three-level rotatable designs for RSM without factorial-corner runs
Central Composite Designs – Response-surface designs combining factorial, axial, and centre points
Completely Randomised Design – The simplest experimental design: random assignment of treatments to units
Constrained Mixture Designs – Mixture experiments with lower and upper bounds on components
Crossover Designs – Within-subject comparison of treatments with washout and carryover consideration
Desirability Functions – Multi-response optimisation via geometric-mean desirability scores
Fractional Factorial Designs – A fraction of a 2^k design: economical screening with acceptable confounding
Graeco-Latin Square Designs – Blocking on three nuisance factors by superimposing two orthogonal Latin squares
Latin Square Designs – Blocking on two nuisance factors via a Latin-square arrangement
Method of Steepest Ascent – Moving along the gradient of a first-order response surface toward the optimum
Optimal Designs: D, A, I – D-, A-, I-, and G-optimality criteria for computer-generated designs
Orthogonal Arrays – Tabulated fractional-factorial arrays L4, L8, L9, L16, L27
Plackett-Burman Designs – Highly efficient two-level screening designs for many factors
Power Analysis for DOE – Calculating power for detecting main effects and interactions in designed experiments
Principles of Blocking – Local control of nuisance variability for precision gains
Randomisation in Design – Purpose, procedures, and role of randomisation in experimental design
Randomised Complete Block Design – Blocking to control known nuisance variation, with construction, analysis, and interpretation in R
Repeated-Measures Designs – Designs with a within-subject factor: multiple measurements per unit
Resolution and Aliasing – Understanding the confounding structure of fractional factorial designs
Response Surface Methodology – Optimising processes via second-order polynomial response surfaces
Robust Parameter Design – Choosing controllable factor levels to minimise response variance across noise factors
Signal-to-Noise Ratios – Taguchi’s SNR metrics combining mean and variance
Simplex Centroid Designs – Mixture designs including centroids of all sub-simplices
Simplex Lattice Mixture Designs – Experimental designs for component-proportion factors that sum to 1
Split-Plot Designs – Designs with hard-to-change and easy-to-change factors
Strip-Plot Designs – Designs with two hard-to-change factors applied to perpendicular strips
Taguchi Methods – Orthogonal-array designs and signal-to-noise optimisation for quality engineering