Skip to content

Search Roadmap & Evaluation

The engine currently has a complete 5-level primitive bot roster: RandomSearch, CheckmateAwareSearch, GreedySearch (baseline), GreedySearchV2 (Cautious Greedy), and AggressiveSearch. Level 6 is the non-primitive MonteCarloSearch bot.

This forms a solid baseline for move evaluation, but single-turn bots are limited to a 1-turn horizon. To develop a stronger, grandmaster-level engine, we need to transition from single-turn heuristics to a deep, probabilistic search tree.

This document defines:

  1. the revised implementation roadmap for deep search algorithms and optimizations
  2. the evaluation protocol that every new search algorithm must pass before it replaces the baseline

The search roadmap should improve move quality without losing the properties that already matter:

  • legal move selection under the maximum micro-moves rule
  • deterministic and reproducible testing
  • predictable runtime for browser and JVM usage
  • a clear migration path from simple heuristics to probabilistic tree search

Future changes should therefore be judged on two axes:

  • playing strength against the current baseline (GreedySearch or the latest optimized bot)
  • cost in runtime, implementation complexity, and memory footprint (minimizing garbage collection on the hot path)

The current baseline is GreedySearch (Level 3).

While primitive, it has several advantages:

  • it reasons over the full Dice Chess turn, not a single micro-move
  • it is deterministic
  • it is easy to explain and debug
  • it creates a stable reference point for head-to-head testing

The baseline remains available even after stronger algorithms are added. It serves as the control group for all future experiments.

The search roadmap is divided into progressive optimization and tree-search stages.

This stage introduces depth-first traversal of multiple plies (turns), reasoning about future turns.

Expected characteristics:

  • Chance Nodes: Representing the stochastic dice roll outcomes (216 ordered rolls / 56 unique multisets) of both sides.
  • Decision Nodes: Selecting the optimal full-turn path (1-3 micro-moves) for the active player.
  • Bounded Depth: Dynamically adjusted search depth based on remaining time controls.

Milestone fit:

  • primarily v0.6 - Expectimax Search Engine

Stage 2: Search Optimizations (Star1 & Star2 Pruning)

Section titled “Stage 2: Search Optimizations (Star1 & Star2 Pruning)”

Traversing a deep expectimax tree scales exponentially. We must implement advanced alpha-beta pruning extensions for games with chance nodes:

  • Star1/Star2 Pruning: Prunes chance nodes by calculating bounds on the mathematical expectation and skipping subtrees that cannot affect the optimal choice.
  • Zobrist Hashing & Transposition Tables (TT): Caches previously computed node values, bounds, and search depths. Zobrist keys must include the board position, active color, and remaining dice pool.

Milestone fit:

  • primarily v0.6 - Expectimax Search Engine and v0.5 (Zobrist/TT groundwork)

Stage 3: Parallel Search with Ox Concurrency

Section titled “Stage 3: Parallel Search with Ox Concurrency”

To utilize multi-core server processors (such as the 4-core ARM nodes on Oracle Cloud), we will parallelize the evaluation of chance-node subtrees:

  • Use Java Virtual Threads (via the Ox structured concurrency library) to spawn lightweight concurrent branch evaluations.
  • Ensure thread-safe read operations on transposition tables.
  • Implement structured cancellation to stop running threads immediately when a beta-cutoff is triggered or when search time expires.

Milestone fit:

  • primarily v0.6 - Expectimax Search Engine

Parallel to the exact tree search, a Rao-Blackwellized Monte-Carlo estimator gives an on-demand pre-roll win-probability estimate for positions too sparse in the games database to read empirically. It integrates the exact per-ply king-capture probability (KingCaptureProbability) along a random rollout weighted by survival, which cuts variance sharply versus vanilla 0/1 rollouts. It reuses TurnGenerator + the RandomSearch policy and exposes a configurable budget (rollouts / target CI width / ply horizon).

See Monte-Carlo Pre-Roll Equity for the algorithm, the variance rationale, and budgeting. It complements position canonicalization (which pools empirical statistics across symmetric positions) for genuinely off-book positions, and shares the KingCaptureProbability machinery with the expectimax chance-node evaluation.

The estimator also drives a bot: MonteCarloSearch (Level 6, registered in BotRegistry) scores every legal turn by the Monte-Carlo win probability of the resulting position and plays the best, preferring an immediate king capture. It is the first non-primitive bot (rollout-based lookahead rather than a one-ply heuristic). Per-move cost scales with the number of legal turns × the rollout budget, so a multi-game win-rate match is validated offline in the JVM Battle Arena — not in CI.

Time control. Because per-move cost grows with the branching factor, an unbounded heavy bot loses on the clock in complex positions. MonteCarloSearch therefore takes a wall-clock deadline: it takes any immediate king capture for free, otherwise ranks turns by a cheap material score, keeps the top K, and Monte-Carlo-evaluates them within an equal slice of the remaining time, always falling back to the best material turn so a legal move is returned even if the deadline elapses first. The caller allocates the game clock into per-move deadlines. The budget paths (rollouts / target-error) stay deterministic for tests; only the wall-clock path is non-deterministic. The acceptance gate is whether time-limited Monte-Carlo beats AggressiveSearch (L5) within a one-minute game budget — otherwise the heavy search is not worth it over the empirical-statistics path.

Milestone fit:

  • feeds the analytics equity guidance now; aligns with v0.6 - Expectimax Search Engine machinery.

Every new algorithm or optimization must be validated at three levels.

These tests prove that the algorithm respects core game semantics and obvious tactical priorities.

  • immediate king capture is always preferred over any non-terminal material sequence
  • the maximum micro-moves rule is never violated
  • castling still consumes the correct dice and is scored correctly
  • promotion branches are ranked consistently

This level protects correctness and prevents regressions that would otherwise be hidden inside large simulations.

This suite is a curated set of fixed positions designed to stress the evaluator and shallow search logic.

  • immediate king capture
  • unblock-and-capture
  • promotion race
  • king exposure after material gain
  • sacrificial trap
  • high-mobility versus low-mobility positions
  • symmetric positions used to verify deterministic tie-breaking

Each scenario defines the FEN, dice multiset, expected preferred line, and the rationale. This suite must remain small enough to run in every local mise run check.

This is the acceptance layer for comparing playing strength. Each challenger algorithm plays a large head-to-head match against the baseline:

  • baseline as White, challenger as Black
  • challenger as White, baseline as Black
  • fixed seed lists for reproducibility
  • repeated across multiple starting positions, tactical middlegames, and simplified endgames

The benchmark harness should follow a stable protocol so that results remain comparable over time.

  • deterministic PRNG seeding
  • explicit algorithm identity and Git commit hash in output reports
  • symmetric color assignment (equal games as White and Black)
  • turn/time budget settings recorded in the report
  • fixed resignation, repetition, and 50-move limit rules

At minimum, collect:

  • win, loss, and draw rates
  • average game length (turns)
  • average decision time per turn
  • average number of candidate paths or nodes evaluated
  • runtime distribution (percentiles), not just mean runtime

A challenger should replace the baseline only when all of the following are true:

  • it passes unit correctness and scenario-suite checks
  • it shows a statistically meaningful improvement in head-to-head results (Win Rate > 50%)
  • it does not introduce an unacceptable runtime regression for the target environment (e.g. browser/JS vs server/JVM)

Each experiment should produce a concise report:

================================================================================
🎲♟️ Dice Chess Bot Arena - JVM Match Runner
Baseline Bot: [Name] (ID)
Games per Color: [N] (Total [2N] games per match)
================================================================================
Opponent Bot | Total | Wins (W/B) | Losses (W/B) | Draws (W/B) | Win Rate | Time
----------------------------------------------------------------------------------------
[Bot A] | 1600 | ... | ... | ... | ...% | ...s

This makes search changes reviewable inside pull requests instead of relying on anecdotal observations.

The future search engine tasks are decomposed as follows:

  1. Expectimax search skeleton (Milestone v0.6)
  2. Star1 and Star2 pruning implementation
  3. Zobrist Hashing & Transposition Table integration
  4. Structured concurrency: parallelize chance-nodes using Ox
  5. Search evaluation reports and fixtures

The recommended milestone mapping is:

  • v0.5 - Evaluation & Heuristics
    • Zobrist Hashing implementation
    • Transposition Table structure
  • v0.6 - Expectimax Search Engine
    • Expectimax implementation and chance-node evaluations
    • Star1/Star2 pruning
    • Concurrency and parallel search improvements with Ox

This keeps the project aligned with the approved milestones published in [AGENTS.md].