Automated Mechanism Design for Optimal Auctions

ai-ml-foundations-and-applications
mechanism-design
auctions
Training a data-driven auction mechanism using gradient-free optimisation in pure R — learning payment rules from bidder profiles, enforcing incentive compatibility and individual rationality, and benchmarking against Myerson’s optimal auction.
Author

Raban Heller

Published

May 8, 2026

Modified

May 8, 2026

Keywords

mechanism design, optimal auctions, Myerson auction, machine learning, incentive compatibility

Introduction & motivation

Mechanism design is often described as “reverse game theory.” While game theory takes the rules of a strategic interaction as given and predicts how rational agents will behave, mechanism design starts with a desired outcome and asks: what rules should we impose to make self-interested agents achieve that outcome? This question is central to the design of auctions, voting systems, matching markets, and virtually any institution where strategic behaviour matters. The foundational contributions of Hurwicz, Myerson, and Maskin — recognised with the 2007 Nobel Memorial Prize in Economics — established the theoretical framework that guides modern mechanism design.

In auction theory, the crown jewel of mechanism design is Roger Myerson’s 1981 characterisation of the revenue-maximising auction for a single item. Myerson showed that for a seller facing bidders whose values are drawn from known distributions, the optimal mechanism can be computed analytically: it involves “ironing” the virtual value function and allocating the item to the bidder with the highest non-negative virtual value. For the canonical case of bidders with values uniformly distributed on \([0, 1]\), the optimal auction is a second-price auction with a reserve price of \(1/2\). This elegant result relies on strong assumptions: the seller knows the distribution of bidder values, bidders are risk-neutral, and values are independently drawn.

In practice, these assumptions are often violated. The seller may not know the value distribution with certainty. Bidders may not be perfectly rational. The setting may involve multiple items, combinatorial preferences, or dynamic arrivals — complications that make analytical solutions intractable. This is where automated mechanism design enters the picture: rather than deriving optimal mechanisms by hand, we use computational methods to search over the space of possible mechanisms and find ones that perform well according to a specified objective.

The intersection of machine learning and mechanism design has become one of the most active areas of research in artificial intelligence and economics. Duetting, Feng, Narasimhan, Parkes, and others have developed deep learning approaches that parameterise auction mechanisms as neural networks and train them end-to-end to maximise revenue while approximately satisfying incentive compatibility (IC) and individual rationality (IR) constraints. These methods can handle complex settings — multiple items, combinatorial valuations, budget constraints — that are analytically intractable.

In this tutorial, we implement a simplified version of this approach entirely in pure R, without any deep learning packages. We parameterise a single-item auction mechanism using a small set of parameters that define allocation and payment rules. We generate training data by sampling bidder value profiles from a known distribution. We then use gradient-free optimisation (the Nelder-Mead method via R’s optim function) to learn the parameters that maximise expected revenue subject to soft constraints on IC and IR violations. Finally, we compare the learned mechanism with Myerson’s analytically optimal auction for the uniform-value case, demonstrating that data-driven approaches can closely approximate theoretical benchmarks.

This tutorial bridges two important themes in the #equilibria project: the classical theory of mechanism design and the modern practice of learning from data. Understanding both perspectives is essential for anyone working at the intersection of economics, computer science, and artificial intelligence. The pure-R implementation makes the core ideas accessible without requiring familiarity with deep learning frameworks, while the comparison with Myerson’s benchmark grounds the computational approach in rigorous theory.

Mathematical formulation

Single-item auction setting

Consider a single seller with one indivisible item and \(n\) bidders. Each bidder \(i\) has a private value \(v_i\) drawn independently from a distribution \(F_i\) on \([0, 1]\). A direct revelation mechanism \((x, p)\) consists of:

  • An allocation rule \(x_i(v_1, \ldots, v_n) \in [0, 1]\): the probability that bidder \(i\) receives the item.
  • A payment rule \(p_i(v_1, \ldots, v_n) \geq 0\): the expected payment of bidder \(i\).

Desirable properties

  • Incentive compatibility (IC): Truthful reporting is a dominant strategy: \[ v_i \cdot x_i(v_i, v_{-i}) - p_i(v_i, v_{-i}) \geq v_i \cdot x_i(v_i', v_{-i}) - p_i(v_i', v_{-i}) \quad \forall v_i, v_i', v_{-i} \]

  • Individual rationality (IR): No bidder has negative utility from participation: \[ v_i \cdot x_i(v_i, v_{-i}) - p_i(v_i, v_{-i}) \geq 0 \quad \forall v_i, v_{-i} \]

  • Feasibility: \(\sum_i x_i(v) \leq 1\) for all value profiles \(v\).

Myerson’s optimal auction (uniform case)

For \(n\) bidders with \(v_i \sim \text{Uniform}[0, 1]\), the virtual value is:

\[ \phi_i(v_i) = 2v_i - 1 \]

The optimal mechanism allocates to \(\arg\max_i \phi_i(v_i)\) when \(\max_i \phi_i(v_i) \geq 0\) (equivalently, when the highest value exceeds \(1/2\)). The expected revenue for \(n = 2\) bidders is:

\[ R^* = \int_0^1 \int_0^1 \max(0, \phi_1, \phi_2) \, dv_1 \, dv_2 = \frac{5}{12} \approx 0.4167 \]

Parameterised mechanism

We parameterise a mechanism using a reserve price \(r\) and a payment function that maps bids to allocations and payments. For the learning problem, we define:

\[ \text{Revenue}(\theta) = \frac{1}{N} \sum_{s=1}^{N} \sum_{i=1}^{n} p_i(b^{(s)}; \theta) \]

where \(\theta\) are learnable parameters and \(b^{(s)}\) are sampled bid profiles.

R implementation

set.seed(2026)

n_bidders <- 2
n_samples <- 5000

# --- Generate training data: value profiles ---
values <- matrix(runif(n_samples * n_bidders), ncol = n_bidders)
colnames(values) <- paste0("v", 1:n_bidders)

# --- Myerson optimal auction for uniform [0,1] ---
myerson_auction <- function(values) {
  n <- ncol(values)
  virtual_values <- 2 * values - 1
  results <- data.frame(
    winner = integer(nrow(values)),
    payment = numeric(nrow(values)),
    revenue = numeric(nrow(values))
  )
  for (s in seq_len(nrow(values))) {
    vv <- virtual_values[s, ]
    max_vv <- max(vv)
    if (max_vv < 0) {
      results$winner[s] <- 0
      results$payment[s] <- 0
    } else {
      winner <- which.max(vv)
      results$winner[s] <- winner
      # Payment: threshold value where winner would still win
      # For 2 bidders: max(0.5, other bidder's value)
      other_vals <- values[s, -winner]
      results$payment[s] <- max(0.5, max(other_vals))
    }
    results$revenue[s] <- results$payment[s]
  }
  results
}

myerson_results <- myerson_auction(values)
myerson_revenue <- mean(myerson_results$revenue)
cat("Myerson optimal revenue (empirical):", round(myerson_revenue, 4), "\n")
Myerson optimal revenue (empirical): 0.4142 
cat("Myerson optimal revenue (theoretical): 0.4167\n")
Myerson optimal revenue (theoretical): 0.4167
# --- Parameterised mechanism ---
# Parameters: reserve_price, bid_weight (how much winner pays beyond reserve)
parametric_auction <- function(params, values) {
  reserve <- plogis(params[1])  # sigmoid to keep in [0, 1]
  alpha <- plogis(params[2])    # payment interpolation parameter

  n_s <- nrow(values)
  revenue <- numeric(n_s)
  ic_violations <- numeric(n_s)
  ir_violations <- numeric(n_s)

  for (s in seq_len(n_s)) {
    v <- values[s, ]
    max_idx <- which.max(v)
    max_val <- v[max_idx]

    if (max_val < reserve) {
      revenue[s] <- 0
      next
    }

    # Winner pays: interpolation between reserve and second-highest bid
    if (ncol(values) > 1) {
      second_val <- max(v[-max_idx])
      payment <- alpha * max(reserve, second_val) + (1 - alpha) * max_val
    } else {
      payment <- alpha * reserve + (1 - alpha) * max_val
    }

    payment <- min(payment, max_val)  # ensure IR (hard constraint)
    revenue[s] <- payment

    # Check IR: utility >= 0
    utility <- max_val - payment
    if (utility < 0) ir_violations[s] <- abs(utility)

    # Check IC (approximate): would misreporting help?
    for (i in seq_along(v)) {
      # Try reporting slightly lower
      v_mis <- v
      v_mis[i] <- v[i] * 0.8
      max_idx_mis <- which.max(v_mis)
      if (max_idx_mis == i && v_mis[i] >= reserve) {
        second_mis <- if (ncol(values) > 1) max(v_mis[-i]) else 0
        pay_mis <- alpha * max(reserve, second_mis) + (1 - alpha) * v_mis[i]
        pay_mis <- min(pay_mis, v[i])
        util_mis <- v[i] - pay_mis
        util_true <- v[i] - payment
        if (i == max_idx && util_mis > util_true + 1e-8) {
          ic_violations[s] <- ic_violations[s] + (util_mis - util_true)
        }
      }
    }
  }

  list(revenue = revenue,
       mean_revenue = mean(revenue),
       ic_violation = mean(ic_violations),
       ir_violation = mean(ir_violations))
}

# --- Objective function for optimisation ---
objective <- function(params, values, lambda_ic = 50, lambda_ir = 50) {
  result <- parametric_auction(params, values)
  # Maximise revenue, penalise constraint violations
  -(result$mean_revenue - lambda_ic * result$ic_violation -
      lambda_ir * result$ir_violation)
}

# --- Optimise using Nelder-Mead ---
cat("\nOptimising mechanism parameters...\n")

Optimising mechanism parameters...
opt_result <- optim(
  par = c(0, 0),  # initial parameters (logit-scale)
  fn = objective,
  values = values,
  method = "Nelder-Mead",
  control = list(maxit = 1000)
)

learned_reserve <- plogis(opt_result$par[1])
learned_alpha <- plogis(opt_result$par[2])
cat("Learned reserve price:", round(learned_reserve, 4), "\n")
Learned reserve price: 0.4822 
cat("Learned payment weight (alpha):", round(learned_alpha, 4), "\n")
Learned payment weight (alpha): 1 
# --- Evaluate learned mechanism ---
learned_results <- parametric_auction(opt_result$par, values)
cat("\nLearned mechanism revenue:", round(learned_results$mean_revenue, 4), "\n")

Learned mechanism revenue: 0.4149 
cat("Myerson optimal revenue:", round(myerson_revenue, 4), "\n")
Myerson optimal revenue: 0.4142 
cat("Revenue ratio:", round(learned_results$mean_revenue / myerson_revenue, 4), "\n")
Revenue ratio: 1.0017 
cat("IC violation:", round(learned_results$ic_violation, 6), "\n")
IC violation: 0 
cat("IR violation:", round(learned_results$ir_violation, 6), "\n")
IR violation: 0 
# --- Compare with second-price auction (no reserve) ---
spa_params <- c(-10, 10)  # reserve ~0, alpha ~1 => second-price
spa_results <- parametric_auction(spa_params, values)
cat("\nSecond-price (no reserve) revenue:", round(spa_results$mean_revenue, 4), "\n")

Second-price (no reserve) revenue: 0.332 
cat("Theoretical SPA revenue (n=2):", round(1/3, 4), "\n")
Theoretical SPA revenue (n=2): 0.3333 
# --- Revenue comparison across reserve prices ---
reserve_grid <- seq(0, 0.9, by = 0.02)
revenue_curve <- sapply(reserve_grid, function(r) {
  params <- c(qlogis(max(r, 0.001)), 10)  # high alpha = SPA-like
  parametric_auction(params, values)$mean_revenue
})

# Myerson revenue at each reserve
myerson_curve <- sapply(reserve_grid, function(r) {
  rev <- numeric(nrow(values))
  for (s in seq_len(nrow(values))) {
    v <- values[s, ]
    max_idx <- which.max(v)
    if (v[max_idx] < r) next
    second <- max(v[-max_idx])
    rev[s] <- max(r, second)
  }
  mean(rev)
})

comparison_df <- data.frame(
  reserve = rep(reserve_grid, 2),
  revenue = c(revenue_curve, myerson_curve),
  mechanism = rep(c("Learned (SPA + reserve)", "Myerson"), each = length(reserve_grid))
)

cat("\nOptimal reserve (grid search):",
    reserve_grid[which.max(revenue_curve)], "\n")

Optimal reserve (grid search): 0.52 

Static publication-ready figure

ggplot(comparison_df, aes(x = reserve, y = revenue, colour = mechanism)) +
  geom_line(linewidth = 1) +
  geom_vline(xintercept = 0.5, linetype = "dashed", colour = "grey50") +
  geom_hline(yintercept = 5 / 12, linetype = "dotted", colour = "grey50") +
  annotate("text", x = 0.52, y = 0.44, label = "Myerson\nreserve = 0.5",
           hjust = 0, size = 3.2, colour = "grey40") +
  annotate("text", x = 0.05, y = 5 / 12 + 0.008,
           label = paste0("Theoretical optimum = ", round(5 / 12, 4)),
           hjust = 0, size = 3.2, colour = "grey40") +
  scale_colour_manual(values = okabe_ito[c(1, 3)]) +
  labs(title = "Revenue Comparison: Learned vs Myerson-Optimal Auction",
       subtitle = "2 bidders, values ~ Uniform[0,1], 5000 sample profiles",
       x = "Reserve Price",
       y = "Expected Revenue",
       colour = "Mechanism") +
  theme_publication() +
  coord_cartesian(ylim = c(0, 0.45))
Figure 1: Expected revenue as a function of reserve price for the learned mechanism (second-price auction with reserve) and the Myerson optimal auction. The vertical dashed line marks the Myerson-optimal reserve of 0.5. Both mechanisms converge near the theoretical optimum.

Interactive figure

comparison_df$label <- paste0(
  "Reserve: ", comparison_df$reserve,
  "\nRevenue: ", round(comparison_df$revenue, 4),
  "\nMechanism: ", comparison_df$mechanism
)

p_int <- ggplot(comparison_df, aes(x = reserve, y = revenue, colour = mechanism,
                                    text = label)) +
  geom_line(linewidth = 0.8) +
  geom_vline(xintercept = 0.5, linetype = "dashed", colour = "grey50") +
  scale_colour_manual(values = okabe_ito[c(1, 3)]) +
  labs(title = "Revenue vs Reserve Price",
       x = "Reserve Price", y = "Expected Revenue",
       colour = "Mechanism") +
  theme_publication()

ggplotly(p_int, tooltip = "text") |>
  config(displaylogo = FALSE,
         modeBarButtonsToRemove = c("select2d", "lasso2d"))
Figure 2: Interactive revenue comparison. Hover to see exact revenue values at each reserve price.

Interpretation

The results of our automated mechanism design exercise illuminate several important themes at the intersection of machine learning and economic theory. Most fundamentally, the learned mechanism closely approximates the Myerson-optimal auction, validating that gradient-free optimisation can discover near-optimal mechanisms even with a simple parameterisation. The learned reserve price converges to a value near 0.5 — precisely the Myerson-optimal reserve for uniform valuations — and the overall revenue achieved by the learned mechanism typically reaches 95-100% of the theoretical optimum.

This convergence is not trivial. The optimisation landscape for mechanism design is challenging: the objective (expected revenue) depends on the interplay between the mechanism’s parameters and the distribution of bidder values, while the constraints (IC and IR) define a complex feasible region. Our use of Lagrangian relaxation — incorporating constraint violations as penalties in the objective — is a standard technique from constrained optimisation, but the appropriate penalty weights (lambda values) require careful tuning. Too-low penalties allow the optimiser to violate IC and IR for short-term revenue gains; too-high penalties overly constrain the search and may prevent finding the revenue-maximising mechanism.

The comparison between the learned mechanism and the second-price auction without a reserve price is particularly instructive. The second-price auction (Vickrey auction) is incentive-compatible and individually rational by construction — it is a cornerstone of auction theory (Vickrey 1961). However, its expected revenue with two uniform bidders is only \(1/3 \approx 0.333\), substantially below the Myerson optimum of \(5/12 \approx 0.417\). The difference comes entirely from the reserve price: by refusing to sell when all values are below \(0.5\), the seller sacrifices some probability of sale but extracts higher payments when sales do occur. The revenue curve as a function of reserve price clearly shows this tradeoff, with revenue initially increasing as the reserve rises from zero, peaking near \(0.5\), and declining as the reserve becomes prohibitively high.

From a machine learning perspective, our approach is a minimal instance of the “learning to auction” paradigm that has gained considerable attention in the AI community. State-of-the-art methods like RegretNet (Duetting et al., 2019) parameterise mechanisms using deep neural networks and train them using stochastic gradient descent with augmented Lagrangian methods for handling constraints. Our pure-R implementation with Nelder-Mead optimisation captures the same conceptual framework — learn mechanism parameters from data to maximise an objective subject to game-theoretic constraints — while remaining transparent and accessible. The key insight is that the “learning” in automated mechanism design is not about fitting a model to observed data in the supervised-learning sense; rather, it is about searching over a parameterised family of mechanisms to find one that performs well in expectation over the distribution of agent types.

The IC and IR constraint violations reported by our implementation deserve careful attention. In theoretical mechanism design, these constraints must hold exactly — even infinitesimal violations can be exploited by rational agents. In practice, and especially in computational approaches, approximate satisfaction is often accepted. The degree of approximation is a design choice that trades off revenue against robustness. Our penalty-based approach drives violations toward zero but does not guarantee exact satisfaction. More sophisticated approaches use projection methods or architectural constraints (e.g., monotone allocation networks) to enforce constraints by construction rather than by penalisation.

Finally, our exercise highlights the value of benchmarking learned mechanisms against known theoretical optima. The uniform-value case serves as a “unit test” for automated mechanism design: any method that cannot recover Myerson’s optimal auction in this simple setting is unlikely to perform well in more complex environments. Having validated our approach on this benchmark, one could extend it to settings where analytical solutions are unknown — asymmetric bidders, correlated values, multiple items — and have some confidence that the learned mechanisms are close to optimal. This progression from theory to computation to practice is the central promise of automated mechanism design.

References

Clarke, Edward H. 1971. “Multipart Pricing of Public Goods.” Public Choice 11 (1): 17–33. https://doi.org/10.1007/BF01726210.
Gale, David, and Lloyd S. Shapley. 1962. “College Admissions and the Stability of Marriage.” The American Mathematical Monthly 69 (1): 9–15. https://doi.org/10.1080/00029890.1962.11989827.
Groves, Theodore. 1973. “Incentives in Teams.” Econometrica 41 (4): 617–31. https://doi.org/10.2307/1914085.
Roth, Alvin E., and Marilda A. Oliveira Sotomayor. 1990. Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis. Cambridge University Press. https://doi.org/10.1017/CCOL052139015X.
Vickrey, William. 1961. “Counterspeculation, Auctions, and Competitive Sealed Tenders.” The Journal of Finance 16 (1): 8–37. https://doi.org/10.1111/j.1540-6261.1961.tb02789.x.
Back to top

Reuse

Citation

BibTeX citation:
@online{heller2026,
  author = {Heller, Raban},
  title = {Automated {Mechanism} {Design} for {Optimal} {Auctions}},
  date = {2026-05-08},
  url = {https://r-heller.github.io/equilibria/tutorials/ai-ml-foundations-and-applications/mechanism-design-ml-auctions/},
  langid = {en}
}
For attribution, please cite this work as:
Heller, Raban. 2026. “Automated Mechanism Design for Optimal Auctions.” May 8. https://r-heller.github.io/equilibria/tutorials/ai-ml-foundations-and-applications/mechanism-design-ml-auctions/.