All Tutorials

Complete index of every tutorial, grouped by topic area.

INDEX

All tutorials

Every tutorial on the site, grouped by topic. Use the links below to jump to a topic landing page or directly to an individual tutorial.

Statistical Foundations

Descriptive Statistics

  • Coefficient of Variation – Scale-free dispersion: the ratio of the standard deviation to the mean
  • Contingency Tables – r × c tables of counts, their expected values under independence, and standardised residuals
  • Cross-Tabulations – Two-way and higher-dimensional tables for joint distributions of categorical variables
  • Descriptive vs Inferential Statistics – The distinction between summarising the sample and making claims about the population
  • Five-Number Summary – Min, Q1, median, Q3, max: the compact distribution summary that powers the boxplot
  • Frequency Tables – Counting categorical and discrete-numeric variables: absolute, relative, and cumulative frequencies
  • Kurtosis – Quantifying tail heaviness of a distribution, and interpreting excess kurtosis
  • Mean Absolute Deviation – Average absolute deviation from the mean or median: a robust alternative to the standard deviation
  • Measures of Central Tendency – Mean, median, mode, and their robust cousins – when each is appropriate and how to compute them in R
  • Measures of Dispersion – Quantifying how spread out the data are: variance, SD, MAD, IQR, and range
  • Measures of Shape – Skewness and kurtosis: summarising the asymmetry and tail heaviness of a distribution beyond location and spread
  • Outlier Detection Rules – Univariate rules: 1.5·IQR fences, z-score thresholds, Hampel identifier, and Grubbs’ test
  • Percentile Ranks – Converting a raw value to its position in the sample distribution, expressed as a percentage
  • Quantiles and Percentiles – Dividing a distribution by cumulative probability: the median, quartiles, percentiles, and R’s nine quantile definitions
  • Robust Statistics: An Overview – Estimators designed to remain close to the truth under contamination, with key concepts: breakdown point, influence function, efficiency
  • Skewness – Measuring the asymmetry of a distribution: sample estimators, interpretation, and typical values in applied data
  • Standardisation and z-Scores – Centring and scaling a variable, and the robust-z alternative
  • Stem-and-Leaf Plots – A text-based compact display of small samples that preserves every data value
  • Summary by Group – Split-apply-combine pattern for computing summaries within subgroups of the data
  • The Geometric Mean – Log-scale averaging for multiplicative data: ratios, growth rates, titres
  • The Harmonic Mean – Reciprocal-averaged summary for rates, speeds, and F1-style composite scores
  • The Interquartile Range – Width of the central 50% of the distribution: robust, interpretable, and the basis of Tukey fences
  • The Weighted Mean – Averaging with unequal weights: survey inclusion probabilities, meta-analysis pooling, and more
  • Trimmed and Winsorised Means – Robust location estimators that compromise between the mean and the median
  • Variance and Standard Deviation – Squared and unsquared dispersion, the n vs n-1 divisor, and why the sample SD is biased

Probability Theory

Inferential Statistics

  • Anderson-Darling Test – A distribution goodness-of-fit test with emphasis on the tails
  • Bartlett’s Test – Test of equal variances across groups, most powerful under normality but very sensitive to its violation
  • Benjamini-Hochberg FDR – Controlling the expected proportion of false discoveries among rejected hypotheses
  • Benjamini-Yekutieli FDR – Dependence-robust false discovery rate control at the cost of conservatism
  • Bonferroni Correction – Controlling family-wise error rate by dividing alpha across m tests
  • Bootstrap Confidence Intervals – Percentile, basic, BCa, and studentised bootstrap CIs
  • Chi-Squared Goodness-of-Fit Test – Testing whether observed category frequencies match an expected distribution
  • Chi-Squared Test of Independence – Testing association between two categorical variables via observed vs. expected counts under independence
  • Cochran-Mantel-Haenszel Test – Stratified analysis of 2x2 tables across levels of a confounding variable
  • Effect Sizes: Overview – Why effect sizes matter and which measures apply to which tests
  • Equivalence Testing with TOST – Two one-sided tests for establishing practical equivalence within a margin
  • Fisher’s Exact Test – Exact test for 2x2 (and r x c) contingency tables, based on the hypergeometric distribution
  • Friedman Test – Non-parametric test for three or more paired / repeated measures
  • Holm’s Step-Down Correction – Sequential Bonferroni with uniformly greater power; controls FWER
  • Kendall’s Tau – Rank correlation based on concordant vs discordant pairs; robust to ties and small samples
  • Kolmogorov-Smirnov Test – CDF-based goodness-of-fit for one-sample and two-sample comparisons
  • Kruskal-Wallis Test – Non-parametric one-way ANOVA based on ranks across three or more groups
  • Levene’s Test of Variances – Robust test of variance homogeneity across groups
  • Mann-Whitney U Test – Non-parametric two-sample test based on ranks; alternative to the independent t-test
  • McNemar’s Test – Paired / matched binary data: comparing two measurements on the same units, using discordant pairs only
  • Mixed ANOVA – Designs combining between-subjects and within-subjects factors, with the interaction as key
  • Multiple Comparisons: Overview – Family-wise error vs false discovery rate and when each applies
  • Null and Alternative Hypotheses – Formulating H0 and H1, choosing one- vs two-sided tests, and avoiding post-hoc reformulation
  • One-Proportion Test – Testing whether an observed proportion differs from a pre-specified reference; exact and score-based methods
  • One-Sample t-Test – Testing whether a sample mean equals a pre-specified reference value
  • One-Sample z-Test – Comparing a sample mean to a reference when the population variance is known – rare in practice but pedagogically useful
  • One-Way ANOVA – Comparing means across three or more independent groups via the F-test
  • P-Values Explained – Definition, correct interpretation, and the most common misinterpretations of the p-value
  • Paired t-Test – Comparing two dependent measurements by applying a one-sample t-test to their differences
  • Pearson Correlation Test – Testing whether the Pearson correlation between two continuous variables is non-zero
  • Permutation Tests – Exact or Monte Carlo p-values via resampling under exchangeability
  • Post-Hoc Tests with Tukey HSD – Pairwise comparisons after ANOVA with family-wise error control
  • Repeated-Measures ANOVA – Within-subjects analysis of variance with sphericity checks and corrections
  • Shapiro-Wilk Normality Test – The most powerful commonly used test for normality in small to moderate samples
  • Spearman Rank Correlation Test – Non-parametric correlation test for monotonic association between two variables
  • The Bootstrap: Introduction – Resampling with replacement to estimate standard errors and sampling distributions
  • The Jackknife – Leave-one-out resampling for bias and variance estimation
  • The Runs Test – Testing randomness of a binary or dichotomised sequence via run lengths
  • The Sign Test – Testing a median or paired difference using only the direction of each observation
  • Two-Proportion Test – Comparing two independent proportions; z-test and chi-squared equivalence, with risk difference and relative risk
  • Two-Sample t-Test – Comparing means between two independent groups with Student’s and Welch’s t-tests, including assumption checks and effect sizes
  • Two-Way ANOVA – Two between-subjects factors: main effects and interaction
  • Type I and Type II Errors – Rejecting a true null (alpha) and failing to reject a false null (beta); power and the trade-off
  • Welch’s t-Test – Two-sample t-test that does not assume equal variances; R’s default
  • Wilcoxon Signed-Rank Test – Non-parametric paired-sample test based on signed ranks

Sample Size & Power

Data Visualisation

  • Aesthetics and Geoms – The two ggplot2 building blocks: what data to map (aesthetics) and how to draw it (geoms)
  • Annotations and Labels – Adding text, arrows, rectangles, and repelling labels to ggplot2 figures
  • Bar Charts – Counts per category and summarised values per category
  • Bland-Altman Plots – Mean-vs-difference plot for comparing two measurement methods
  • Boxplots – Five-number summary display: Tukey boxplots with whiskers and outlier rules
  • Bubble Plots – Scatter plots with a third variable encoded by point size
  • Colour Palettes – Discrete, continuous, perceptually uniform, and diverging palettes for ggplot2
  • Colour-Blind-Safe Plots – Designing plots that remain readable to viewers with colour-vision deficiencies
  • Contour Plots – Isolines of a 2D scalar field or density
  • Correlation Heatmaps – Visualising a correlation matrix with colour-encoded cells and significance markers
  • Density Plots – Smooth distribution display via kernel density estimation
  • Dot Plots – Cleveland dot plots and Wilkinson dot plots for compact distributional displays
  • Facets and Panels – Creating small multiples via facet_wrap and facet_grid
  • Forest Plots (Visualisation) – Point estimates and confidence intervals stacked across studies or subgroups
  • Funnel Plots (Visualisation) – Study effect vs. precision plot used to detect publication bias in meta-analysis
  • Heatmaps – Matrix displays with colour-encoded cell values, optionally with row/column ordering
  • Hexbin Plots – Binning the 2D plane into hexagons and colouring by count, for large bivariate data
  • Histograms – Univariate distribution display via binned frequencies or densities
  • Interactive Plots with ggiraph – Interactive SVG-based ggplot extensions with per-element hover, click, and selection
  • Interactive Plots with plotly – Converting ggplot objects to interactive HTML plots via ggplotly()
  • Line Plots – Connecting ordered observations with lines for time-series and trajectory displays
  • Pairs Plots – Scatterplot matrices for exploring pairwise relationships among several continuous variables
  • Patchwork: Multi-Plot Composition – Composing multiple ggplot objects into a single figure with the patchwork package
  • ROC Curves – Sensitivity vs 1-specificity plots for diagnostic tests and binary classifiers
  • Raincloud Plots – Half-violin + jittered raw points + boxplot: a comprehensive distribution display
  • Ridge Plots – Stacked density curves across groups, also called ‘joy plots’
  • Saving and Exporting Figures – Exporting ggplot figures to PDF, PNG, SVG, and TIFF with the right dimensions and DPI
  • Scales and Coordinates – Controlling how data values are mapped to visual values (scales) and how the plot area is organised (coordinates)
  • Scatter Plots – The bivariate continuous default: geom_point with overplotting strategies and trend lines
  • Stacked and Dodged Bars – Two-factor bar charts: stacking for composition, dodging for side-by-side comparison
  • Survival Curves – Publication-quality Kaplan-Meier plots with risk tables and log-rank annotation
  • The Grammar of Graphics – Understanding ggplot2 as a layered grammar: data, mappings, geoms, stats, scales, coordinates, facets, and themes
  • Time Series Plots – Dated line plots with forecast bands and decomposition displays
  • Violin Plots – Symmetric kernel density plots for comparing distributions across groups
  • ggplot2 Themes – Visual styling of plots via complete themes and fine-grained theme() elements

Regression & Modelling

Multivariate Methods

Time-Series Analysis

  • ARIMA Models – Autoregressive integrated moving average models for forecasting stationary and trending time series
  • Augmented Dickey-Fuller Test – Testing for unit roots / non-stationarity against a stationary alternative
  • Bayesian Changepoint Detection – Posterior probability of a changepoint at each time, with online and offline variants
  • Changepoint Detection – Identifying abrupt shifts in mean, variance, or slope of a time series
  • Diebold-Mariano Test – Testing equality of forecast accuracy between two competing models
  • Differencing a Time Series – Removing trend via first differences; seasonal differencing for cyclic patterns
  • ETS Models – State-space formulation of exponential smoothing: Error, Trend, Seasonal
  • Exponential Smoothing – Recursive weighted-average smoothers: simple, Holt (trend), Holt-Winters (seasonal)
  • Forecast Accuracy Metrics – MAE, RMSE, MAPE, MASE: scaled and scale-dependent measures of forecast error
  • GARCH Models – Conditional heteroscedasticity models: volatility clustering in financial and other series
  • Granger Causality – Testing whether one time series improves prediction of another beyond its own history
  • Holt-Winters Method – Triple exponential smoothing with level, trend, and seasonal components
  • Interpreting ACF and PACF – Identifying AR and MA orders from autocorrelation and partial autocorrelation functions
  • KPSS Test – Testing for stationarity (null) against a unit-root alternative
  • Moving Averages – Smoothing a time series by averaging over a sliding window
  • Phillips-Perron Test – Non-parametric adjustment of the Dickey-Fuller test for autocorrelation and heteroscedasticity
  • Rolling-Origin Cross-Validation – Time-series-aware cross-validation that respects temporal ordering
  • Seasonal ARIMA (SARIMA) – ARIMA extended with seasonal autoregressive, moving-average, and differencing terms
  • Seasonal Decomposition (STL) – Loess-based seasonal-trend decomposition robust to outliers
  • Spectral Analysis – Decomposing a time series into frequency components via periodogram and spectral density
  • State-Space Models – Latent-state time-series framework unifying ARIMA, exponential smoothing, and many others
  • Stationarity Tests – Testing whether a time series has constant mean and variance over time
  • The Kalman Filter – Recursive optimal estimation for linear-Gaussian state-space models
  • The Ljung-Box Test – Portmanteau test for remaining autocorrelation in residuals across multiple lags
  • Time Series: Introduction – Trend, seasonality, cyclicality, and noise: the four components of a time series
  • VECM and Cointegration – Vector error correction models for cointegrated series; Johansen trace and eigenvalue tests
  • Vector Autoregression (VAR) – Multivariate time series where each variable depends on lags of all variables
  • Wavelet Analysis – Time-frequency decomposition capturing localised periodicities that change over time
  • White Noise Tests – Testing a residual series for independence and constant variance
  • X-13ARIMA-SEATS Decomposition – The seasonal adjustment tool used by US Census Bureau and statistical agencies

Bayesian Statistics

Survival Analysis

Bioinformatics

Machine Learning

Clinical Biostatistics

Meta-Analysis

Experimental Design