All Tutorials
Complete index of every tutorial, grouped by topic area.
INDEX
All tutorials
Every tutorial on the site, grouped by topic. Use the links below to jump to a topic landing page or directly to an individual tutorial.
Statistical Foundations
- Bias and Variance of Estimators – Decomposing an estimator’s mean squared error into systematic bias and sampling variance
- Big-Op and Little-op Notation – Stochastic order symbols for describing rates of convergence and stochastic magnitudes
- Characteristic Functions – Fourier transforms of distributions: always exist, uniquely identify the distribution, and drive convergence theorems
- Confidence Intervals: Introduction – Interval estimates with a coverage guarantee, and how they differ from credible intervals
- Convergence in Distribution – The weakest standard mode of convergence, underlying the central limit theorem
- Convergence in Probability – The mode of convergence that underpins consistency of estimators
- Estimators and Estimation – Point estimation and the plug-in principle: turning a sample into a guess about a population parameter
- Fisher Information – How much the data tell you about a parameter, measured by the curvature of the log-likelihood
- Jensen’s Inequality – The expectation of a convex function is at least the convex function of the expectation
- Likelihood Ratio Tests – Comparing nested models by the ratio of maximised likelihoods, with Wilks’ asymptotic chi-squared
- Markov and Chebyshev Inequalities – Tail bounds derived from moments, with the LLN as a simple consequence
- Maximum Likelihood Estimation – The likelihood function, score equations, and asymptotic properties of the MLE
- Moments and Moment Generating Functions – Moments summarise distributions; the MGF is a generating function that often characterises them uniquely
- Order Statistics – Distributions of the sorted sample and their role in quantile theory and robust estimation
- Pivotal Quantities – Functions of data and parameter whose distribution does not depend on the parameter, and the CIs they produce
- Population vs. Sample – Target population, sampling frame, and the distinction between parameters and statistics
- Sampling Distributions – The distribution of a statistic across repeated samples: the key object that makes inference possible
- Sampling Methods – Simple random, stratified, cluster, systematic, and convenience sampling – and when each is appropriate
- Scales of Measurement – Nominal, ordinal, interval, and ratio scales, and which statistics each legitimately supports
- Slutsky’s Theorem – Combining convergence in distribution and in probability for asymptotic arguments
- Sufficient Statistics – Summaries that capture every bit of information about a parameter, with the factorisation theorem and exponential family
- The Cauchy-Schwarz Inequality – The inner-product inequality that bounds correlations and drives many variance inequalities
- The Central Limit Theorem – Why the sampling distribution of the mean is approximately normal, and what that means for everyday inference
- The Cramer-Rao Lower Bound – The smallest variance an unbiased estimator can achieve, derived from Fisher information
- The Delta Method – Asymptotic distributions of smooth functions of estimators, via a first-order Taylor expansion
- The Empirical Distribution Function – The sample-based step-function estimator of the CDF, with the DKW inequality for its uniform error
- The Glivenko-Cantelli Theorem – The uniform LLN for CDFs: the ECDF converges almost surely to the true CDF at every point simultaneously
- The Law of Large Numbers – Why the sample mean converges to the population mean as the sample grows, in weak and strong forms
- The Method of Moments – Estimate parameters by equating theoretical moments to their sample counterparts
- The Standard Error – The standard deviation of a statistic’s sampling distribution, and how to compute it in R
- Unbiasedness, Consistency, Efficiency – Three core finite- and large-sample properties that let us compare estimators
Descriptive Statistics
- Coefficient of Variation – Scale-free dispersion: the ratio of the standard deviation to the mean
- Contingency Tables – r × c tables of counts, their expected values under independence, and standardised residuals
- Cross-Tabulations – Two-way and higher-dimensional tables for joint distributions of categorical variables
- Descriptive vs Inferential Statistics – The distinction between summarising the sample and making claims about the population
- Five-Number Summary – Min, Q1, median, Q3, max: the compact distribution summary that powers the boxplot
- Frequency Tables – Counting categorical and discrete-numeric variables: absolute, relative, and cumulative frequencies
- Kurtosis – Quantifying tail heaviness of a distribution, and interpreting excess kurtosis
- Mean Absolute Deviation – Average absolute deviation from the mean or median: a robust alternative to the standard deviation
- Measures of Central Tendency – Mean, median, mode, and their robust cousins – when each is appropriate and how to compute them in R
- Measures of Dispersion – Quantifying how spread out the data are: variance, SD, MAD, IQR, and range
- Measures of Shape – Skewness and kurtosis: summarising the asymmetry and tail heaviness of a distribution beyond location and spread
- Outlier Detection Rules – Univariate rules: 1.5·IQR fences, z-score thresholds, Hampel identifier, and Grubbs’ test
- Percentile Ranks – Converting a raw value to its position in the sample distribution, expressed as a percentage
- Quantiles and Percentiles – Dividing a distribution by cumulative probability: the median, quartiles, percentiles, and R’s nine quantile definitions
- Robust Statistics: An Overview – Estimators designed to remain close to the truth under contamination, with key concepts: breakdown point, influence function, efficiency
- Skewness – Measuring the asymmetry of a distribution: sample estimators, interpretation, and typical values in applied data
- Standardisation and z-Scores – Centring and scaling a variable, and the robust-z alternative
- Stem-and-Leaf Plots – A text-based compact display of small samples that preserves every data value
- Summary by Group – Split-apply-combine pattern for computing summaries within subgroups of the data
- The Geometric Mean – Log-scale averaging for multiplicative data: ratios, growth rates, titres
- The Harmonic Mean – Reciprocal-averaged summary for rates, speeds, and F1-style composite scores
- The Interquartile Range – Width of the central 50% of the distribution: robust, interpretable, and the basis of Tukey fences
- The Weighted Mean – Averaging with unequal weights: survey inclusion probabilities, meta-analysis pooling, and more
- Trimmed and Winsorised Means – Robust location estimators that compromise between the mean and the median
- Variance and Standard Deviation – Squared and unsquared dispersion, the n vs n-1 divisor, and why the sample SD is biased
Probability Theory
- Bayes’ Theorem – Inverting conditional probabilities: from likelihood to posterior, with applications to diagnostic testing and Bayesian inference
- Conditional Distributions – The distribution of one random variable given the value of another, and conditional expectations
- Conditional Probability – Probability of one event given another, the product rule, and how conditioning reshapes the sample space
- Convolutions of Distributions – Distribution of a sum of independent random variables, computed via convolution integrals
- Copulas: An Introduction – Separating marginal distributions from the dependence structure via Sklar’s theorem
- Discrete vs Continuous Variables – Two fundamental kinds of random variable and the mixed case in between
- Expectation of a Random Variable – The probability-weighted average of a random variable’s values, with its algebraic properties
- Independence – Events and variables that do not inform each other, and the distinction between pairwise and mutual independence
- Joint Distributions – The distribution of a pair (or vector) of random variables, encoding their dependence structure
- Kolmogorov’s Axioms of Probability – The three axioms that define a probability measure on a sigma-algebra of events
- Law of Total Probability – Decomposing a marginal probability over a partition of the sample space
- Marginal Distributions – Recovering the distribution of a single variable from a joint distribution by integrating out the others
- Random Variables – Measurable functions from sample space to real numbers: the mathematical object behind every statistic
- Sample Space and Events – The set of possible outcomes, events as subsets, and the algebra of set operations
- Student’s t-Distribution – Heavier-tailed relative of the normal, used for small-sample inference on means
- The Bernoulli Distribution – The distribution of a single binary trial with success probability p
- The Beta Distribution – Distribution over [0, 1] for proportions and probabilities; conjugate prior for the binomial
- The Binomial Distribution – The count of successes in n independent Bernoulli trials with common probability p
- The Chi-Squared Distribution – Sum of squared independent standard normals; foundation of chi-squared tests and ANOVA
- The Correlation Coefficient – Standardised covariance bounded in [-1, 1], measuring the strength of linear association
- The Cumulative Distribution Function – F(x) = P(X ≤ x): the universal descriptor of a random variable’s distribution
- The Exponential Distribution – Waiting time between events in a Poisson process, with the memoryless property
- The F Distribution – Ratio of two independent scaled chi-squared random variables; foundation of ANOVA and variance tests
- The Gamma Distribution – Flexible right-skewed distribution for positive continuous data; sum of independent exponentials
- The Geometric Distribution – Number of Bernoulli trials until the first success, with the memoryless property
- The Hazard Function – Instantaneous event rate given survival so far; central concept of survival analysis
- The Hypergeometric Distribution – Sampling without replacement from a finite population
- The Log-Normal Distribution – Positive-valued distribution whose logarithm is normal; multiplicative processes and biomedical concentrations
- The Multinomial Distribution – Multi-category generalisation of the binomial with category probabilities summing to one
- The Multivariate Normal Distribution – Generalisation of the normal to random vectors; the keystone of multivariate analysis
- The Negative Binomial Distribution – Overdispersed count distribution, derivable as a gamma mixture of Poissons
- The Normal Distribution – Definition, properties, and practical use of the normal (Gaussian) distribution, with R examples for every common task
- The Poisson Distribution – Counts of rare events per unit time or space with rate λ
- The Probability Density Function – The function whose integral gives probability for a continuous random variable
- The Probability Mass Function – The function that assigns probability to each value of a discrete random variable
- The Survival Function – Probability of surviving past time t; the complement of the CDF for positive random variables
- The Uniform Distribution – Equal probability over an interval (continuous) or a set (discrete)
- The Weibull Distribution – Flexible positive-valued distribution with monotone hazard; reliability and survival analysis
- Transformations of Random Variables – Distribution of g(X) given the distribution of X, via the change-of-variables formula
- Variance and Covariance – Second moments of one and two random variables: dispersion and joint variation
Inferential Statistics
- Anderson-Darling Test – A distribution goodness-of-fit test with emphasis on the tails
- Bartlett’s Test – Test of equal variances across groups, most powerful under normality but very sensitive to its violation
- Benjamini-Hochberg FDR – Controlling the expected proportion of false discoveries among rejected hypotheses
- Benjamini-Yekutieli FDR – Dependence-robust false discovery rate control at the cost of conservatism
- Bonferroni Correction – Controlling family-wise error rate by dividing alpha across m tests
- Bootstrap Confidence Intervals – Percentile, basic, BCa, and studentised bootstrap CIs
- Chi-Squared Goodness-of-Fit Test – Testing whether observed category frequencies match an expected distribution
- Chi-Squared Test of Independence – Testing association between two categorical variables via observed vs. expected counts under independence
- Cochran-Mantel-Haenszel Test – Stratified analysis of 2x2 tables across levels of a confounding variable
- Effect Sizes: Overview – Why effect sizes matter and which measures apply to which tests
- Equivalence Testing with TOST – Two one-sided tests for establishing practical equivalence within a margin
- Fisher’s Exact Test – Exact test for 2x2 (and r x c) contingency tables, based on the hypergeometric distribution
- Friedman Test – Non-parametric test for three or more paired / repeated measures
- Holm’s Step-Down Correction – Sequential Bonferroni with uniformly greater power; controls FWER
- Kendall’s Tau – Rank correlation based on concordant vs discordant pairs; robust to ties and small samples
- Kolmogorov-Smirnov Test – CDF-based goodness-of-fit for one-sample and two-sample comparisons
- Kruskal-Wallis Test – Non-parametric one-way ANOVA based on ranks across three or more groups
- Levene’s Test of Variances – Robust test of variance homogeneity across groups
- Mann-Whitney U Test – Non-parametric two-sample test based on ranks; alternative to the independent t-test
- McNemar’s Test – Paired / matched binary data: comparing two measurements on the same units, using discordant pairs only
- Mixed ANOVA – Designs combining between-subjects and within-subjects factors, with the interaction as key
- Multiple Comparisons: Overview – Family-wise error vs false discovery rate and when each applies
- Null and Alternative Hypotheses – Formulating H0 and H1, choosing one- vs two-sided tests, and avoiding post-hoc reformulation
- One-Proportion Test – Testing whether an observed proportion differs from a pre-specified reference; exact and score-based methods
- One-Sample t-Test – Testing whether a sample mean equals a pre-specified reference value
- One-Sample z-Test – Comparing a sample mean to a reference when the population variance is known – rare in practice but pedagogically useful
- One-Way ANOVA – Comparing means across three or more independent groups via the F-test
- P-Values Explained – Definition, correct interpretation, and the most common misinterpretations of the p-value
- Paired t-Test – Comparing two dependent measurements by applying a one-sample t-test to their differences
- Pearson Correlation Test – Testing whether the Pearson correlation between two continuous variables is non-zero
- Permutation Tests – Exact or Monte Carlo p-values via resampling under exchangeability
- Post-Hoc Tests with Tukey HSD – Pairwise comparisons after ANOVA with family-wise error control
- Repeated-Measures ANOVA – Within-subjects analysis of variance with sphericity checks and corrections
- Shapiro-Wilk Normality Test – The most powerful commonly used test for normality in small to moderate samples
- Spearman Rank Correlation Test – Non-parametric correlation test for monotonic association between two variables
- The Bootstrap: Introduction – Resampling with replacement to estimate standard errors and sampling distributions
- The Jackknife – Leave-one-out resampling for bias and variance estimation
- The Runs Test – Testing randomness of a binary or dichotomised sequence via run lengths
- The Sign Test – Testing a median or paired difference using only the direction of each observation
- Two-Proportion Test – Comparing two independent proportions; z-test and chi-squared equivalence, with risk difference and relative risk
- Two-Sample t-Test – Comparing means between two independent groups with Student’s and Welch’s t-tests, including assumption checks and effect sizes
- Two-Way ANOVA – Two between-subjects factors: main effects and interaction
- Type I and Type II Errors – Rejecting a true null (alpha) and failing to reject a false null (beta); power and the trade-off
- Welch’s t-Test – Two-sample t-test that does not assume equal variances; R’s default
- Wilcoxon Signed-Rank Test – Non-parametric paired-sample test based on signed ranks
Sample Size & Power
- Effect Size: Cohen’s d – Standardised mean difference: definition, variants, interpretation
- Effect Size: Cohen’s h – Standardised effect size for comparisons between proportions via arcsine transformation
- Effect Size: Eta-Squared – Proportion of variance explained in ANOVA and its partial, generalised, and omega variants
- Minimum Detectable Effect – Solving for the smallest effect that can be detected at a given sample size and power
- Post-Hoc Power: A Controversy – Why computing power from the observed effect is circular and uninformative
- Power Analysis: Introduction – The four quantities of power analysis and how they trade against each other in study design
- Power for Agreement (Kappa) – Sample size for Cohen’s kappa as a precision-targeted or hypothesis-test planning quantity
- Power for Bland-Altman Studies – Sample size for estimating limits of agreement with adequate precision
- Power for Chi-Squared Tests – Sample size for goodness-of-fit and contingency chi-squared using Cohen’s w
- Power for Cluster-RCT – Sample size for cluster-randomised trials: design effect from ICC and cluster size
- Power for Correlation Tests – Sample size for Pearson and Spearman correlations via Fisher’s z transformation
- Power for Cox Regression – Events-driven sample size: the number of events, not subjects, drives power
- Power for Crossover Trials – Sample size for 2x2 crossover designs, exploiting within-subject comparison
- Power for Diagnostic Accuracy – Sample size for estimating sensitivity and specificity with adequate precision
- Power for Equivalence (TOST) – Sample size for two-one-sided-tests equivalence testing within a pre-specified margin
- Power for ICC – Sample size for the intraclass correlation coefficient in reliability studies
- Power for Linear Regression – Sample size for overall regression F and for individual coefficients via Cohen’s f^2
- Power for Logistic Regression – Sample size for logistic regression: events per variable, odds-ratio detection, and simulation
- Power for McNemar’s Test – Sample size for paired binary comparisons driven by the rate of discordant pairs
- Power for Non-Inferiority Trials – One-sided equivalence: is the new treatment no worse than the reference by more than the margin
- Power for One-Proportion Test – Sample size for testing a single proportion against a reference, with normal and exact methods
- Power for One-Sample t-Test – Computing sample size and power for a test of one mean against a reference
- Power for One-Way ANOVA – Sample size for detecting a difference among three or more group means
- Power for Paired t-Test – Sample size for paired designs, taking advantage of within-subject correlation
- Power for Repeated-Measures ANOVA – Sample size for within-subjects designs with within-subject correlation
- Power for Stepped-Wedge Trials – Variance of the treatment effect in stepped-wedge cluster designs via the Hussey-Hughes formula
- Power for Two-Proportion Test – Sample size for comparing proportions between two independent groups
- Power for the Log-Rank Test – Event-based sample size for Kaplan-Meier comparisons via the log-rank test
- Sample Size Sensitivity Analysis – Reporting sample size across a range of assumed effect sizes, SDs, and dropout rates
- Sample Size for a Two-Sample t-Test – Power analysis and sample size calculation for comparing two independent group means, with worked examples in R
Data Visualisation
- Aesthetics and Geoms – The two ggplot2 building blocks: what data to map (aesthetics) and how to draw it (geoms)
- Annotations and Labels – Adding text, arrows, rectangles, and repelling labels to ggplot2 figures
- Bar Charts – Counts per category and summarised values per category
- Bland-Altman Plots – Mean-vs-difference plot for comparing two measurement methods
- Boxplots – Five-number summary display: Tukey boxplots with whiskers and outlier rules
- Bubble Plots – Scatter plots with a third variable encoded by point size
- Colour Palettes – Discrete, continuous, perceptually uniform, and diverging palettes for ggplot2
- Colour-Blind-Safe Plots – Designing plots that remain readable to viewers with colour-vision deficiencies
- Contour Plots – Isolines of a 2D scalar field or density
- Correlation Heatmaps – Visualising a correlation matrix with colour-encoded cells and significance markers
- Density Plots – Smooth distribution display via kernel density estimation
- Dot Plots – Cleveland dot plots and Wilkinson dot plots for compact distributional displays
- Facets and Panels – Creating small multiples via facet_wrap and facet_grid
- Forest Plots (Visualisation) – Point estimates and confidence intervals stacked across studies or subgroups
- Funnel Plots (Visualisation) – Study effect vs. precision plot used to detect publication bias in meta-analysis
- Heatmaps – Matrix displays with colour-encoded cell values, optionally with row/column ordering
- Hexbin Plots – Binning the 2D plane into hexagons and colouring by count, for large bivariate data
- Histograms – Univariate distribution display via binned frequencies or densities
- Interactive Plots with ggiraph – Interactive SVG-based ggplot extensions with per-element hover, click, and selection
- Interactive Plots with plotly – Converting ggplot objects to interactive HTML plots via ggplotly()
- Line Plots – Connecting ordered observations with lines for time-series and trajectory displays
- Pairs Plots – Scatterplot matrices for exploring pairwise relationships among several continuous variables
- Patchwork: Multi-Plot Composition – Composing multiple ggplot objects into a single figure with the patchwork package
- ROC Curves – Sensitivity vs 1-specificity plots for diagnostic tests and binary classifiers
- Raincloud Plots – Half-violin + jittered raw points + boxplot: a comprehensive distribution display
- Ridge Plots – Stacked density curves across groups, also called ‘joy plots’
- Saving and Exporting Figures – Exporting ggplot figures to PDF, PNG, SVG, and TIFF with the right dimensions and DPI
- Scales and Coordinates – Controlling how data values are mapped to visual values (scales) and how the plot area is organised (coordinates)
- Scatter Plots – The bivariate continuous default: geom_point with overplotting strategies and trend lines
- Stacked and Dodged Bars – Two-factor bar charts: stacking for composition, dodging for side-by-side comparison
- Survival Curves – Publication-quality Kaplan-Meier plots with risk tables and log-rank annotation
- The Grammar of Graphics – Understanding ggplot2 as a layered grammar: data, mappings, geoms, stats, scales, coordinates, facets, and themes
- Time Series Plots – Dated line plots with forecast bands and decomposition displays
- Violin Plots – Symmetric kernel density plots for comparing distributions across groups
- ggplot2 Themes – Visual styling of plots via complete themes and fine-grained theme() elements
Regression & Modelling
- Added-Variable Plots – Partial regression plots showing each predictor’s unique contribution
- Best Subset Selection – Enumerate all candidate predictor subsets to find the best by a chosen criterion
- Beta Regression – Regression for continuous outcomes in (0, 1): proportions, percentages, fractions
- Centring and Scaling Predictors – Why and how to centre and scale continuous predictors before regression
- Contrasts in R – Built-in and custom contrast matrices for factor variables in regression and ANOVA
- Cook’s Distance – Single-number influence measure combining leverage and residual
- Cox Regression as Regression – The Cox proportional-hazards model in the GLM-family perspective; pointer to the Survival section
- Dirichlet Regression – Regression for compositional outcomes: vectors of proportions summing to 1
- Dummy Coding – Treatment-contrast coding for categorical predictors: reference category and indicator variables
- Effect (Deviation) Coding – Sum-to-zero contrasts: coefficients as deviations from the grand mean
- Elastic Net – Combined L1 and L2 penalty: variable selection plus grouping behaviour
- GAMMs: Additive Mixed Models – Combining smooth effects (GAM) with random effects (LMM) via mgcv::gam
- GAMs: Introduction – Generalised additive models with smoothing splines for non-linear predictor effects
- Generalised Estimating Equations – Marginal regression models for clustered data with working correlation structures
- Hurdle Models – Two-part count models: binary decision for zero vs. non-zero, plus truncated count for positives
- Interactions in Regression – Product terms in regression, centring, and the interpretation of conditional effects
- Lasso Regression – L1-penalised regression for simultaneous shrinkage and variable selection
- Leverage and Influence – Hat values, DFBETAs, and Cook’s distance for identifying influential observations
- Linear Regression Assumptions – The four classical assumptions of linear regression and how to check each
- Logistic Regression – Regression for binary outcomes: odds ratios, logit link, and MLE fitting
- Logistic Regression Diagnostics – Calibration plots, Hosmer-Lemeshow, deviance residuals, and influence for GLM
- Mixed-Effects Models: Introduction – Hierarchical / multilevel regression: fixed effects for population-level structure, random effects for cluster-level variability
- Model Selection with AIC and BIC – Information criteria for comparing models: Akaike’s and Bayesian
- Multicollinearity and VIF – Variance inflation factors, condition numbers, and remedies for collinear predictors
- Multinomial Logistic Regression – Regression for unordered categorical outcomes with three or more levels
- Multiple Linear Regression – Linear regression with two or more predictors: coefficient interpretation, partial effects, and collinearity
- Negative Binomial Regression – Count regression with explicit variance-dispersion parameter for overdispersed data
- Nested vs Crossed Random Effects – Distinguishing hierarchical from partially-overlapping grouping structures in mixed models
- Offsets in Regression – Treating exposure time or person-years as a known component of the linear predictor
- Ordinal Logistic Regression – Regression for ordered categorical outcomes using the proportional-odds model
- Partial Correlation – Correlation between two variables after removing the linear influence of others
- Poisson Regression – Log-linear count regression with rates and offsets
- Polynomial Regression – Extending linear regression with powers of predictors: orthogonal polynomials and overfitting risk
- Probit Regression – Binary regression using the cumulative normal link
- Quantile Regression – Modelling conditional quantiles (median, upper/lower percentiles) rather than the mean
- Quasi-Poisson Regression – Poisson regression with a dispersion parameter for moderate overdispersion
- R-Squared and Adjusted R-Squared – Proportion of variance explained, and the penalty for model complexity
- ROC and AUC for Logistic Models – Discrimination quality of logistic regression via receiver operating characteristic curves
- Random-Intercept Models – Mixed model with cluster-specific intercepts but common slopes
- Random-Slope Models – Mixed model with cluster-specific slopes, allowing effects to vary across groups
- Residual Diagnostics – Four standard residual plots, what they show, and how to read them
- Ridge Regression – L2 penalty for coefficient shrinkage; stable fits under collinearity
- Robust Regression – M-estimators and MM-estimators for regression resistant to outliers and heavy-tailed errors
- Simple Linear Regression – Fitting, diagnosing, and interpreting a linear regression with a single continuous predictor, from first principles
- Spline Regression – Piecewise-polynomial regression with natural cubic splines for flexible non-linear fits
- Stepwise Regression: Pitfalls – Why automated predictor selection by p-value or AIC has serious statistical problems
- Tobit Regression – Linear regression for outcomes censored at a known threshold
- Truncated Regression – Regression when observations are missing (not just censored) below or above a threshold
- Worked lme4::lmer Examples – Common formula patterns for lmer with interpretation of each output block
- Zero-Inflated Models – Mixture models for count data with more zeros than Poisson or negative binomial predicts
- glmer: Generalised Mixed Models – Extending mixed models to binary, count, and other GLM outcomes
Multivariate Methods
- Bray-Curtis Dissimilarity – The standard ecological distance metric for species-abundance community data
- Canonical Correlation Analysis – Identifying pairs of linear combinations that maximise correlation between two variable sets
- Cluster Validation: Silhouette – Measuring how well each observation fits its assigned cluster relative to neighbouring clusters
- Confirmatory Factor Analysis – Testing a pre-specified factor structure against data, with chi-squared and incremental fit indices
- Correspondence Analysis – Low-dimensional biplot representation of contingency tables via chi-squared distance
- DBSCAN Clustering – Density-based clustering with automatic noise detection and arbitrary cluster shapes
- Distance and Dissimilarity Measures – Euclidean, Manhattan, Minkowski, Canberra, and their use in clustering / MDS
- Exploratory Factor Analysis – Identifying latent factors from correlated manifest variables, with extraction, rotation, and number-of-factors decisions
- Factor Rotation – Orthogonal and oblique rotations to make factor loadings interpretable
- Gaussian Mixture Models – Model-based clustering via mixtures of multivariate normals, fitted by EM
- Hierarchical Clustering – Agglomerative (bottom-up) clustering with various linkage criteria and dendrogram display
- Independent Component Analysis – Signal separation: recovering independent latent sources from linear mixtures
- Jaccard and Dice Dissimilarity – Distance measures for binary / presence-absence data
- Kernel PCA – Non-linear dimensionality reduction via PCA in an implicit feature space
- Linear Discriminant Analysis – Classifier assuming multivariate normal within-class distributions with equal covariance
- MANOVA – Multivariate analysis of variance: testing group differences across multiple correlated outcomes jointly
- Mahalanobis Distance – Scale- and correlation-aware distance for multivariate data
- Metric Multidimensional Scaling – Representing a distance matrix in low-dimensional Euclidean coordinates
- Multiple Correspondence Analysis – Extension of correspondence analysis to more than two categorical variables
- Non-Metric Multidimensional Scaling – MDS using only the ranks of distances, minimising Kruskal stress
- PLS Discriminant Analysis – PLS extended to classification via dummy-coded class indicators
- Partial Least Squares Regression – Regression using latent components optimised for outcome prediction in high dimensions
- Partitioning Around Medoids – PAM: robust k-medoids clustering for non-Euclidean distances and outlier-contaminated data
- Principal Component Analysis – Dimensionality reduction by variance-maximising projection, with interpretation, scaling, and visualisation in R
- Quadratic Discriminant Analysis – LDA with class-specific covariance matrices; quadratic decision boundaries
- Structural Equation Models – Combining measurement (CFA) and structural (path) components for latent-variable regression
- The Elbow Method – Choosing the number of clusters by identifying a bend in the within-cluster sum of squares curve
- The Gap Statistic – Choosing the number of clusters via comparison with a null (uniform) distribution
- UMAP and t-SNE: Overview – Modern non-linear dimensionality reduction for visualisation: UMAP, t-SNE, and their caveats
- k-Means Clustering – Partitioning observations into k clusters by minimising within-cluster sum of squares
Time-Series Analysis
- ARIMA Models – Autoregressive integrated moving average models for forecasting stationary and trending time series
- Augmented Dickey-Fuller Test – Testing for unit roots / non-stationarity against a stationary alternative
- Bayesian Changepoint Detection – Posterior probability of a changepoint at each time, with online and offline variants
- Changepoint Detection – Identifying abrupt shifts in mean, variance, or slope of a time series
- Diebold-Mariano Test – Testing equality of forecast accuracy between two competing models
- Differencing a Time Series – Removing trend via first differences; seasonal differencing for cyclic patterns
- ETS Models – State-space formulation of exponential smoothing: Error, Trend, Seasonal
- Exponential Smoothing – Recursive weighted-average smoothers: simple, Holt (trend), Holt-Winters (seasonal)
- Forecast Accuracy Metrics – MAE, RMSE, MAPE, MASE: scaled and scale-dependent measures of forecast error
- GARCH Models – Conditional heteroscedasticity models: volatility clustering in financial and other series
- Granger Causality – Testing whether one time series improves prediction of another beyond its own history
- Holt-Winters Method – Triple exponential smoothing with level, trend, and seasonal components
- Interpreting ACF and PACF – Identifying AR and MA orders from autocorrelation and partial autocorrelation functions
- KPSS Test – Testing for stationarity (null) against a unit-root alternative
- Moving Averages – Smoothing a time series by averaging over a sliding window
- Phillips-Perron Test – Non-parametric adjustment of the Dickey-Fuller test for autocorrelation and heteroscedasticity
- Rolling-Origin Cross-Validation – Time-series-aware cross-validation that respects temporal ordering
- Seasonal ARIMA (SARIMA) – ARIMA extended with seasonal autoregressive, moving-average, and differencing terms
- Seasonal Decomposition (STL) – Loess-based seasonal-trend decomposition robust to outliers
- Spectral Analysis – Decomposing a time series into frequency components via periodogram and spectral density
- State-Space Models – Latent-state time-series framework unifying ARIMA, exponential smoothing, and many others
- Stationarity Tests – Testing whether a time series has constant mean and variance over time
- The Kalman Filter – Recursive optimal estimation for linear-Gaussian state-space models
- The Ljung-Box Test – Portmanteau test for remaining autocorrelation in residuals across multiple lags
- Time Series: Introduction – Trend, seasonality, cyclicality, and noise: the four components of a time series
- VECM and Cointegration – Vector error correction models for cointegrated series; Johansen trace and eigenvalue tests
- Vector Autoregression (VAR) – Multivariate time series where each variable depends on lags of all variables
- Wavelet Analysis – Time-frequency decomposition capturing localised periodicities that change over time
- White Noise Tests – Testing a residual series for independence and constant variance
- X-13ARIMA-SEATS Decomposition – The seasonal adjustment tool used by US Census Bureau and statistical agencies
Bayesian Statistics
- Bayes Factors – Ratio of marginal likelihoods under two models, the Bayesian analogue of a likelihood-ratio test
- Bayes’ Theorem for Parameters – Using Bayes’ theorem to update beliefs about parameters given data
- Bayesian ANOVA – Posterior inference on group differences with credible intervals and contrast comparisons
- Bayesian Hierarchical Models – Partial pooling through prior structure for grouped / multilevel data
- Bayesian Hypothesis Testing – Region of practical equivalence (ROPE) and Bayes factors for Bayesian decisions
- Bayesian Linear Regression – Linear regression with priors on coefficients, fit via MCMC in brms or rstanarm
- Bayesian Logistic Regression – Logistic regression with MCMC-based posterior for binary outcomes
- Bayesian Mediation Analysis – Testing indirect effects with posterior samples
- Bayesian Meta-Analysis – Hierarchical meta-analysis with explicit prior on between-study heterogeneity
- Bayesian Mixed Models – Multilevel regression with random effects, fit via Bayesian MCMC
- Conjugate Priors – When the prior and posterior share the same distributional family: the beta-binomial, normal-normal, and gamma-Poisson models
- Credible Intervals – Bayesian analogue of confidence intervals: equal-tailed and highest-posterior-density regions
- Divergences and Pairs Plots – Diagnosing HMC sampling problems via divergent transitions and pair-plot visualisations
- Gibbs Sampling – MCMC via sequential sampling from conditional distributions
- Hamiltonian Monte Carlo – Gradient-based MCMC using Hamiltonian dynamics; NUTS variant in Stan
- Highest Posterior Density Interval – Shortest interval containing a specified posterior probability mass
- Informative Priors – Priors derived from previous studies, expert elicitation, or mechanistic knowledge
- Jeffreys’ Prior – Invariant reference prior proportional to the square root of the Fisher information
- Leave-One-Out Cross-Validation – Efficient Bayesian LOO-CV via Pareto-smoothed importance sampling (PSIS-LOO)
- Metropolis-Hastings – The foundational MCMC algorithm with proposal distribution and accept/reject rule
- Posterior Mean and Variance – Point estimates and uncertainty summaries from a posterior distribution
- Posterior Predictive Checks – Comparing simulated data from the fitted model to observed data for model criticism
- Prior Specification – Choosing priors: informative, weakly informative, flat, and the risks of each
- R-hat and ESS Diagnostics – Convergence (R-hat) and effective sample size (ESS) for MCMC quality control
- Stan: Introduction – Writing probabilistic models in Stan’s data-parameters-model blocks
- The Beta-Binomial Model – Conjugate Bayesian inference for a binomial proportion with a Beta prior
- The Gamma-Poisson Model – Conjugate Bayesian inference for a Poisson rate with Gamma prior
- The Normal-Normal Model – Conjugate Bayesian inference for a normal mean with known variance
- The Posterior Predictive Distribution – Distribution of a new observation integrating over posterior uncertainty about parameters
- The Savage-Dickey Density Ratio – Bayes factor for a point null via posterior/prior density ratio
- WAIC – Widely applicable information criterion: Bayesian out-of-sample prediction score
- Weakly Informative Priors – Priors that regularise without imposing substantive constraints
- brms Basics – High-level R interface to Stan: formula syntax, families, prior specification
- rstanarm – Pre-compiled Stan models with a frequentist-like R interface
- tidybayes Workflow – Extract, manipulate, and visualise posterior draws in tidyverse style
Survival Analysis
- Accelerated Failure Time Models – Parametric survival regression on the log-time scale with acceleration-factor interpretation
- Censoring Types – Right, left, interval, and informative censoring in time-to-event data
- Checking Cox Assumptions – Diagnostics for proportional hazards, functional form, and influential observations
- Competing Risks: Cumulative Incidence – Estimating the probability of each event type in the presence of competing risks
- Concordance and the C-Index – Discrimination ability of a survival model measured by concordance probability
- Conditional Survival – Updating prognosis for survivors: P(T > t + s | T > s)
- Cox Proportional Hazards Regression – Semi-parametric regression for survival data with arbitrary baseline hazard
- Fine-Gray Subdistribution Hazards – Regression directly targeting cumulative incidence in competing-risks data
- Frailty Models – Random effects in survival: unobserved heterogeneity and clustered survival data
- Hazard, Survival, and Cumulative Hazard – The three equivalent representations of a time-to-event distribution
- Interval-Censored Data – Events known to occur within a time interval, not at an exact time
- Joint Longitudinal-Survival Models – Simultaneously modelling a longitudinal biomarker and time-to-event outcome
- Kaplan-Meier Estimation – Non-parametric estimation of the survival function from right-censored data, with confidence intervals and group comparisons
- Landmark Analysis – Conditioning on survival to a landmark time to analyse time-varying covariates
- Left Truncation – Delayed entry: subjects only enter the study after surviving to a certain age or time
- Log-Rank Test: Details – Weighted and stratified log-rank test for comparing survival across groups
- Multi-State Models – Modelling transitions between discrete states over time
- Parametric Survival: Log-Normal – Log-normal AFT model for non-monotone hazards
- Parametric Survival: Weibull – Fitting Weibull regression in both PH and AFT parameterisations
- Recurrent Events – Modelling repeated events per subject: Andersen-Gill, PWP, and frailty approaches
- Restricted Mean Survival Time (RMST) – Alternative summary robust to non-proportional hazards and censoring
- Royston-Parmar Flexible Parametric Models – Splines on the log-cumulative-hazard for flexible baseline without leaving parametric modelling
- Schoenfeld Residuals – Residuals that test the proportional-hazards assumption in Cox regression
- Simulating Survival Data – Generating realistic censored time-to-event datasets for power analysis and method validation
- Stratified Cox Models – Stratum-specific baseline hazards when proportional hazards fails across groups
- The Brier Score for Survival – Time-dependent prediction loss combining calibration and discrimination
- The Nelson-Aalen Estimator – Non-parametric estimator of the cumulative hazard under right-censoring
- Time-Dependent ROC – ROC curves and AUC evaluated at specific follow-up times for survival predictions
- Time-Varying Covariates – Covariates whose value changes during follow-up: counting-process data layout
- Weighted Log-Rank Tests – Peto-Peto, Fleming-Harrington, and other weighted variants for non-proportional hazards
Bioinformatics
- ATAC-Seq Analysis – Chromatin accessibility profiling via ATAC-seq peak calling and differential analysis
- Alignment with BWA and Bowtie – Fast read alignment to a reference genome via Burrows-Wheeler Transform indexing
- Batch Correction with ComBat – Removing known batch effects from expression data using empirical-Bayes methods
- Bulk RNA-seq Differential Expression with DESeq2 – A complete DESeq2 workflow: from count matrix through normalisation, dispersion estimation, Wald testing, and LFC shrinkage
- Cell-Type Annotation – Assigning cell types to clusters using reference-based and manual methods
- ChIP-Seq Analysis – TF binding and histone modification profiling from ChIP-seq data
- Copy Number Variation Analysis – Inferring genomic copy-number changes from sequencing or array data
- Counting Reads with featureCounts – Assigning reads to genes or genomic features from a BAM file
- DNA Methylation Analysis – Differentially methylated positions and regions from bisulfite or array data
- Differential Expression with edgeR – NB GLM-based differential expression with exactTest and quasi-likelihood F-test
- Differential Expression with limma-voom – Transforming counts for linear modelling with precision weights
- Drug-Target Interaction Mining – Integrating bioassay databases for drug-target identification
- FASTQ Quality Control – Inspecting sequencing read quality via Phred scores and FastQC-style reports
- Finding Marker Genes – Identifying cluster-specific genes for cell-type annotation
- GSEA Preranked Analysis – Enrichment of gene sets in a ranked list without a significance cutoff
- GSVA Single-Sample Enrichment – Per-sample pathway scores for downstream sample-level analysis
- Gene Annotation with biomaRt – Programmatic queries against Ensembl for IDs, coordinates, and annotations
- Gene Ontology Enrichment – Over-representation of GO terms in a differentially-expressed gene set
- Heatmaps for RNA-seq – Visualising expression patterns across genes and samples with clustering
- Integration with Harmony – Removing batch effects in scRNA-seq while preserving biological variation
- KEGG Pathway Enrichment – Over-representation and visualisation against KEGG metabolic/signalling pathways
- MA Plots – Mean-average plots for differential expression diagnostics
- Metagenomic Profiling – Whole-metagenome taxonomic and functional profiling with MetaPhlAn and Kraken
- Microbiome Analysis with DADA2 – Amplicon sequence variant (ASV) inference from 16S/ITS amplicon data
- Microbiome Diversity Metrics – Alpha and beta diversity measures for community comparison
- Multi-Omics Integration – Joint analysis of transcriptomic, genomic, and epigenomic layers
- Multiple Sequence Alignment – Aligning multiple sequences via ClustalW, Muscle, or T-Coffee from R
- PCA of RNA-seq Samples – Using PCA on variance-stabilised counts to check sample structure and batches
- Phylogenetic Trees with ape – Building and interpreting phylogenetic trees using distance-based and maximum-likelihood methods
- Population Genetics Basics – Allele frequencies, Hardy-Weinberg equilibrium, and F-statistics
- Protein Structure Prediction – From sequence to 3-D structure with AlphaFold and RoseTTAFold
- Proteomics with MSstats – Differential protein abundance from label-free or labelled mass-spectrometry data
- Pseudoalignment with Salmon and Kallisto – Fast transcript quantification without full alignment
- RNA-seq Normalisation – TMM, RLE, upper-quartile, and CPM: making samples comparable
- Read Trimming and Adapter Removal – Removing low-quality bases and sequencing adapters before alignment
- STAR: Spliced Alignment – Fast splice-aware RNA-seq aligner capable of detecting novel junctions
- Sequence Alignment: Overview – Global vs local alignment, scoring matrices, and the core algorithms
- Single-Cell with Seurat – End-to-end workflow for single-cell RNA-seq analysis in Seurat
- Spatial Transcriptomics – Spatially resolved expression analysis with Visium and related platforms
- Structural Variant Detection – Detecting deletions, duplications, inversions, and translocations from WGS
- Surrogate Variable Analysis (SVA) – Detecting and adjusting for unknown sources of heterogeneity in expression data
- Trajectory Analysis – Inferring developmental and activation trajectories from scRNA-seq
- Transcript-to-Gene Summarisation – Aggregating transcript-level estimates to gene level for standard differential expression
- VCF Manipulation – Reading, filtering, and subsetting VCF files in R and on the command line
- Variant Annotation with VEP – Predicting functional consequences of variants using Ensembl VEP
- Variant Calling with GATK – Short-variant detection from aligned BAMs via the GATK Best Practices pipeline
- Volcano Plots – Visualising significance versus effect size across genes
- scRNA Clustering – Graph-based clustering of cells via Louvain or Leiden community detection
- scRNA Normalisation – Library-size normalisation and variance stabilisation for scRNA-seq
- scRNA QC and Filtering – Removing low-quality cells before downstream scRNA-seq analysis
Machine Learning
- A Complete mlr3 Workflow – End-to-end modelling with mlr3: tasks, learners, resamplings, and tuning
- A Complete tidymodels Workflow – End-to-end modelling in tidymodels: recipes, parsnip, workflows, and tuning
- Anomaly Detection – Unsupervised methods for flagging rare, unusual observations
- Bagging – Bootstrap aggregation: variance reduction through averaging
- Calibration Plots – Reliability diagrams: do predicted probabilities match observed frequencies?
- CatBoost – Gradient boosting with native categorical handling and ordered target encoding
- Class Weights for Imbalanced Data – Loss-based weighting as an alternative to resampling
- Convolutional Neural Networks: Introduction – Spatial filters and pooling for image and signal learning
- Cross-Validation – Estimating out-of-sample predictive performance honestly with k-fold, repeated, nested, and leave-one-out cross-validation
- DBSCAN as an ML Tool – Density-based clustering and outlier detection in one step
- Decision Trees (CART) – Recursive binary partitioning with Gini or variance splits
- Dropout and Early Stopping – Two of the most effective neural-network regularisation techniques
- Extremely Randomised Trees – Extra Trees: random thresholds for faster and more regularised ensembles
- Feature Engineering – Transformations, interactions, and encodings to expose signal to models
- Feature Selection – Filter, wrapper, and embedded strategies to reduce the feature space
- Feedforward Networks in torch – Building and training deep feedforward networks in R with the torch package
- Gradient Boosting – Stagewise additive modelling with gradient descent on loss
- Isolation Forest – Tree-based unsupervised anomaly detection via path length
- Isotonic Regression Calibration – Non-parametric monotone calibration via pool-adjacent-violators
- Kernel SVMs – Non-linear classification via the kernel trick with RBF and polynomial kernels
- LIME Explanations – Local Interpretable Model-Agnostic Explanations via surrogate linear models
- LightGBM – Histogram-based leaf-wise gradient boosting for large datasets
- Linear Discriminant as ML – LDA as a classifier with shared Gaussian class-conditional covariance
- Linear Support Vector Machines – Large-margin classifiers with soft-margin slack and the C parameter
- Logistic Regression as ML – Linear classifier with log-loss and L1/L2 regularisation
- Naive Bayes – Probabilistic classification under conditional independence
- Nested Cross-Validation – Honest model evaluation with separate loops for tuning and assessment
- Neural Networks: Introduction – From perceptrons to multi-layer networks: weights, layers, and activation functions
- Partial Dependence Plots – Marginal effect of a feature on model predictions
- Platt Scaling – Logistic recalibration: mapping raw scores to calibrated probabilities
- Preprocessing with recipes – Leak-free feature engineering in tidymodels with recipes
- RNNs and LSTMs: Introduction – Recurrent networks for sequence modelling with gated memory
- Random Forests – Bagging decision trees with random feature subsets for robust ensembles
- Regularisation in ML – L2 ridge, L1 lasso, and elastic net for controlling model complexity
- SHAP Values – Game-theoretic local feature attributions via Shapley values
- SMOTE for Imbalanced Classes – Synthetic Minority Oversampling Technique for class-imbalanced classification
- Stacking Ensembles – Combining heterogeneous base models via a meta-learner
- Supervised Learning: Overview – Framework, loss functions, generalisation, and the bias-variance decomposition
- The Bias-Variance Tradeoff – Decomposition of prediction error into structural and sampling components
- Train-Test Splits – Holdout evaluation, stratification, and honest generalisation estimates
- Transformers: Overview – Self-attention architectures underlying modern NLP and beyond
- Variable Importance – Impurity-based and permutation importance for tree ensembles and beyond
- XGBoost – Regularised gradient boosting with sparsity awareness and early stopping
- k-Means as an ML Tool – Using k-means for feature engineering, quantisation, and pre-segmentation
- k-Nearest Neighbours Classification – Instance-based learning with majority voting among nearest neighbours
- k-Nearest Neighbours Regression – Local averaging for non-parametric regression
Clinical Biostatistics
- Adaptive Trial Designs – Pre-specified modifications to trial conduct based on interim data
- Alpha-Spending Functions – Flexible interim alpha allocation via Lan-DeMets spending functions
- Baseline Adjustment with ANCOVA – Adjusting for baseline covariates to improve precision and address regression to the mean
- Bland-Altman Limits of Agreement – Graphical comparison of two measurement methods on the same subjects
- Blinding Procedures – Masking participants, investigators, assessors, and analysts
- Block Randomisation – Random permutations within fixed blocks for balanced allocation
- Clinical Equivalence Trials – Two one-sided tests (TOST) for bioequivalence and clinical equivalence
- Cluster-Randomised Trials – Randomisation of clusters (clinics, schools) rather than individuals
- Cohen’s Kappa – Chance-corrected agreement between two raters on categorical data
- Conditional Power – Probability of eventual trial success given interim data
- Crossover RCT Design – Within-subject comparison across treatment periods with washout
- Cutpoint Selection – Youden’s J, cost-based, and closest-to-(0,1) criteria for diagnostic thresholds
- Diagnostic Test Accuracy – Sensitivity, specificity, predictive values, likelihood ratios, and ROC analysis for binary diagnostic tests
- Factorial Trials – Simultaneously evaluating multiple interventions via a factorial design
- ITT vs Per-Protocol Analysis – Primary analyses under the intent-to-treat and per-protocol principles
- Interim Analyses and Group Sequential Designs – Pre-planned interim looks with alpha-spending and early-stopping boundaries
- Intraclass Correlation Coefficient (ICC) – Absolute agreement and consistency for continuous inter-rater reliability
- Likelihood Ratios – LR+ and LR- as prevalence-independent summaries of diagnostic performance
- Minimisation Algorithm – Covariate-adaptive allocation that prospectively minimises imbalance
- Missing Data in RCTs – MCAR, MAR, MNAR and their implications for primary analysis
- Multiple Imputation – Imputing multiple plausible values and combining via Rubin’s rules
- Non-Inferiority Margin Selection – Defining the clinically acceptable maximum inferiority
- O’Brien-Fleming Boundary – Conservative early stopping boundary in group-sequential trials
- Parallel-Group RCT Design – The two-arm randomised controlled trial: structure, analysis, and reporting
- Pocock Boundary – Constant nominal alpha across interim analyses
- Predictive Values and Prevalence – How disease prevalence determines PPV and NPV via Bayes’ theorem
- ROC Analysis – Receiver Operating Characteristic curves and area under the curve
- Randomisation Methods – Simple, block, and stratified randomisation for RCT allocation
- Reliability and Cronbach’s Alpha – Internal consistency of multi-item scales
- Sample Size Re-Estimation – Updating trial sample size mid-study using blinded or unblinded nuisance parameter estimates
- Sensitivity Analyses in Clinical Trials – Exploring robustness of conclusions to assumptions about missing data and model choice
- Stepped-Wedge Trial – Sequential rollout cluster design with all clusters eventually treated
- Stratified Randomisation – Separate randomisation lists within levels of baseline covariates
- Subgroup Analyses – Pre-specified subgroup effects and treatment-by-subgroup interaction tests
- Subgroup Forest Plots – Visual summary of subgroup-specific effects and interaction tests
- Weighted Kappa – Ordinal inter-rater agreement with linear or quadratic weights
Meta-Analysis
- Bivariate SROC Curves – Hierarchical summary ROC modelling for diagnostic-accuracy meta-analysis
- Converting Between Effect Sizes – Moving between standardised mean differences, correlations, and odds ratios
- Cumulative Meta-Analysis – Sequentially pooling studies in the order they were published
- Egger’s Test for Funnel Asymmetry – Formal regression test for small-study effects and publication bias
- Fixed-Effect vs. Random-Effects Meta-Analysis – Two ways to pool study-level effect sizes, and the consequences of each for inference and interpretation
- Forest Plots in Meta-Analysis – Standard visualisation of per-study effects, weights, and the pooled estimate
- Funnel Plots – Visual inspection for small-study effects and publication bias
- Individual Participant Data Meta-Analysis – Two-stage vs one-stage analysis of raw data pooled across studies
- Leave-One-Out Meta-Analysis – Sensitivity of the pooled estimate to each individual study
- Meta-Analysis of Diagnostic Accuracy – Bivariate and HSROC models for pooling diagnostic test performance
- Meta-Regression – Explaining between-study heterogeneity with study-level covariates
- NMA Consistency and Inconsistency – Loop-specific and global tests for direct-indirect evidence agreement
- NMA League Tables – Pairwise treatment contrasts in a compact triangular grid
- Network Meta-Analysis: Introduction – Combining direct and indirect comparisons across multiple treatments
- Pooling Correlation Coefficients – Fisher z transformation for meta-analysis of correlations
- Pooling Log-Odds Ratios – Fixed and random-effects meta-analysis of binary outcomes on the log scale
- Pooling Risk Ratios – Meta-analysis of binary outcomes on the risk-ratio scale
- Prediction Intervals in Meta-Analysis – The plausible range of a future study’s true effect under random-effects
- Quantifying Heterogeneity: Q and I^2 – Cochran’s Q, I-squared, and their interpretation in meta-analysis
- SUCRA Rankings – Surface Under the Cumulative Ranking curve for treatment comparisons
- Selection Models for Publication Bias – Weighted-distribution models for adjusting meta-analysis for selection
- Standardised Mean Difference: Hedges’ g – Small-sample-corrected standardised effect size for continuous outcomes
- Subgroup Meta-Analysis – Pooling within strata and testing for between-subgroup heterogeneity
- Tau-Squared Estimation – Estimators of between-study variance: DL, REML, PM, SJ, EB
- Trim-and-Fill – Adjusting the pooled estimate for apparent funnel asymmetry
Experimental Design
- 2^k Factorial Designs – Full factorial experiments with k two-level factors
- 3^k Factorial Designs – Full factorial experiments with k three-level factors
- Balanced Incomplete Block Designs – BIBDs when block size is smaller than treatment count
- Box-Behnken Designs – Three-level rotatable designs for RSM without factorial-corner runs
- Central Composite Designs – Response-surface designs combining factorial, axial, and centre points
- Completely Randomised Design – The simplest experimental design: random assignment of treatments to units
- Constrained Mixture Designs – Mixture experiments with lower and upper bounds on components
- Crossover Designs – Within-subject comparison of treatments with washout and carryover consideration
- Desirability Functions – Multi-response optimisation via geometric-mean desirability scores
- Fractional Factorial Designs – A fraction of a 2^k design: economical screening with acceptable confounding
- Graeco-Latin Square Designs – Blocking on three nuisance factors by superimposing two orthogonal Latin squares
- Latin Square Designs – Blocking on two nuisance factors via a Latin-square arrangement
- Method of Steepest Ascent – Moving along the gradient of a first-order response surface toward the optimum
- Optimal Designs: D, A, I – D-, A-, I-, and G-optimality criteria for computer-generated designs
- Orthogonal Arrays – Tabulated fractional-factorial arrays L4, L8, L9, L16, L27
- Plackett-Burman Designs – Highly efficient two-level screening designs for many factors
- Power Analysis for DOE – Calculating power for detecting main effects and interactions in designed experiments
- Principles of Blocking – Local control of nuisance variability for precision gains
- Randomisation in Design – Purpose, procedures, and role of randomisation in experimental design
- Randomised Complete Block Design – Blocking to control known nuisance variation, with construction, analysis, and interpretation in R
- Repeated-Measures Designs – Designs with a within-subject factor: multiple measurements per unit
- Resolution and Aliasing – Understanding the confounding structure of fractional factorial designs
- Response Surface Methodology – Optimising processes via second-order polynomial response surfaces
- Robust Parameter Design – Choosing controllable factor levels to minimise response variance across noise factors
- Signal-to-Noise Ratios – Taguchi’s SNR metrics combining mean and variance
- Simplex Centroid Designs – Mixture designs including centroids of all sub-simplices
- Simplex Lattice Mixture Designs – Experimental designs for component-proportion factors that sum to 1
- Split-Plot Designs – Designs with hard-to-change and easy-to-change factors
- Strip-Plot Designs – Designs with two hard-to-change factors applied to perpendicular strips
- Taguchi Methods – Orthogonal-array designs and signal-to-noise optimisation for quality engineering