Search Roadmap & Evaluation
The engine currently has a complete 5-level primitive bot roster: RandomSearch, CheckmateAwareSearch, GreedySearch (baseline), GreedySearchV2 (Cautious Greedy), and AggressiveSearch. Level 6 is the non-primitive MonteCarloSearch bot.
This forms a solid baseline for move evaluation, but single-turn bots are limited to a 1-turn horizon. To develop a stronger, grandmaster-level engine, we need to transition from single-turn heuristics to a deep, probabilistic search tree.
This document defines:
- the revised implementation roadmap for deep search algorithms and optimizations
- the evaluation protocol that every new search algorithm must pass before it replaces the baseline
The search roadmap should improve move quality without losing the properties that already matter:
- legal move selection under the maximum micro-moves rule
- deterministic and reproducible testing
- predictable runtime for browser and JVM usage
- a clear migration path from simple heuristics to probabilistic tree search
Future changes should therefore be judged on two axes:
- playing strength against the current baseline (
GreedySearchor the latest optimized bot) - cost in runtime, implementation complexity, and memory footprint (minimizing garbage collection on the hot path)
Baseline
Section titled “Baseline”The current baseline is GreedySearch (Level 3).
While primitive, it has several advantages:
- it reasons over the full Dice Chess turn, not a single micro-move
- it is deterministic
- it is easy to explain and debug
- it creates a stable reference point for head-to-head testing
The baseline remains available even after stronger algorithms are added. It serves as the control group for all future experiments.
Revised Roadmap
Section titled “Revised Roadmap”The search roadmap is divided into progressive optimization and tree-search stages.
Stage 1: Deep Expectimax Search
Section titled “Stage 1: Deep Expectimax Search”This stage introduces depth-first traversal of multiple plies (turns), reasoning about future turns.
Expected characteristics:
- Chance Nodes: Representing the stochastic dice roll outcomes (216 ordered rolls / 56 unique multisets) of both sides.
- Decision Nodes: Selecting the optimal full-turn path (1-3 micro-moves) for the active player.
- Bounded Depth: Dynamically adjusted search depth based on remaining time controls.
Milestone fit:
- primarily v0.6 - Expectimax Search Engine
Stage 2: Search Optimizations (Star1 & Star2 Pruning)
Section titled “Stage 2: Search Optimizations (Star1 & Star2 Pruning)”Traversing a deep expectimax tree scales exponentially. We must implement advanced alpha-beta pruning extensions for games with chance nodes:
- Star1/Star2 Pruning: Prunes chance nodes by calculating bounds on the mathematical expectation and skipping subtrees that cannot affect the optimal choice.
- Zobrist Hashing & Transposition Tables (TT): Caches previously computed node values, bounds, and search depths. Zobrist keys must include the board position, active color, and remaining dice pool.
Milestone fit:
- primarily v0.6 - Expectimax Search Engine and v0.5 (Zobrist/TT groundwork)
Stage 3: Parallel Search with Ox Concurrency
Section titled “Stage 3: Parallel Search with Ox Concurrency”To utilize multi-core server processors (such as the 4-core ARM nodes on Oracle Cloud), we will parallelize the evaluation of chance-node subtrees:
- Use Java Virtual Threads (via the
Oxstructured concurrency library) to spawn lightweight concurrent branch evaluations. - Ensure thread-safe read operations on transposition tables.
- Implement structured cancellation to stop running threads immediately when a beta-cutoff is triggered or when search time expires.
Milestone fit:
- primarily v0.6 - Expectimax Search Engine
Stage 4: Monte-Carlo Pre-Roll Equity
Section titled “Stage 4: Monte-Carlo Pre-Roll Equity”Parallel to the exact tree search, a Rao-Blackwellized Monte-Carlo estimator gives an on-demand
pre-roll win-probability estimate for positions too sparse in the games database to read empirically.
It integrates the exact per-ply king-capture probability (KingCaptureProbability) along a random
rollout weighted by survival, which cuts variance sharply versus vanilla 0/1 rollouts. It reuses
TurnGenerator + the RandomSearch policy and exposes a configurable budget (rollouts / target CI
width / ply horizon).
See Monte-Carlo Pre-Roll Equity for the algorithm,
the variance rationale, and budgeting. It complements position canonicalization (which pools
empirical statistics across symmetric positions) for genuinely off-book positions, and shares the
KingCaptureProbability machinery with the expectimax chance-node evaluation.
The estimator also drives a bot: MonteCarloSearch (Level 6, registered in BotRegistry) scores
every legal turn by the Monte-Carlo win probability of the resulting position and plays the best,
preferring an immediate king capture. It is the first non-primitive bot (rollout-based lookahead
rather than a one-ply heuristic). Per-move cost scales with the number of legal turns × the rollout
budget, so a multi-game win-rate match is validated offline in the JVM Battle Arena — not in CI.
Time control. Because per-move cost grows with the branching factor, an unbounded heavy bot loses
on the clock in complex positions. MonteCarloSearch therefore takes a
wall-clock deadline: it takes any immediate king capture for free, otherwise ranks turns by a
cheap material score, keeps the top K, and Monte-Carlo-evaluates them within an equal slice of the
remaining time, always falling back to the best material turn so a legal move is returned even if the
deadline elapses first. The caller allocates the game clock into per-move deadlines. The budget paths
(rollouts / target-error) stay deterministic for tests; only the wall-clock path is non-deterministic.
The acceptance gate is whether time-limited Monte-Carlo beats AggressiveSearch (L5) within a
one-minute game budget — otherwise the heavy search is not worth it over the empirical-statistics path.
Milestone fit:
- feeds the analytics equity guidance now; aligns with v0.6 - Expectimax Search Engine machinery.
Evaluation Pyramid
Section titled “Evaluation Pyramid”Every new algorithm or optimization must be validated at three levels.
Level 1: Unit Correctness
Section titled “Level 1: Unit Correctness”These tests prove that the algorithm respects core game semantics and obvious tactical priorities.
- immediate king capture is always preferred over any non-terminal material sequence
- the maximum micro-moves rule is never violated
- castling still consumes the correct dice and is scored correctly
- promotion branches are ranked consistently
This level protects correctness and prevents regressions that would otherwise be hidden inside large simulations.
Level 2: Scenario Suite
Section titled “Level 2: Scenario Suite”This suite is a curated set of fixed positions designed to stress the evaluator and shallow search logic.
- immediate king capture
- unblock-and-capture
- promotion race
- king exposure after material gain
- sacrificial trap
- high-mobility versus low-mobility positions
- symmetric positions used to verify deterministic tie-breaking
Each scenario defines the FEN, dice multiset, expected preferred line, and the rationale. This suite must remain small enough to run in every local mise run check.
Level 3: Mass Simulation (Bot Arena)
Section titled “Level 3: Mass Simulation (Bot Arena)”This is the acceptance layer for comparing playing strength. Each challenger algorithm plays a large head-to-head match against the baseline:
- baseline as White, challenger as Black
- challenger as White, baseline as Black
- fixed seed lists for reproducibility
- repeated across multiple starting positions, tactical middlegames, and simplified endgames
Match Protocol
Section titled “Match Protocol”The benchmark harness should follow a stable protocol so that results remain comparable over time.
Required controls
Section titled “Required controls”- deterministic PRNG seeding
- explicit algorithm identity and Git commit hash in output reports
- symmetric color assignment (equal games as White and Black)
- turn/time budget settings recorded in the report
- fixed resignation, repetition, and 50-move limit rules
Required metrics
Section titled “Required metrics”At minimum, collect:
- win, loss, and draw rates
- average game length (turns)
- average decision time per turn
- average number of candidate paths or nodes evaluated
- runtime distribution (percentiles), not just mean runtime
Acceptance criteria
Section titled “Acceptance criteria”A challenger should replace the baseline only when all of the following are true:
- it passes unit correctness and scenario-suite checks
- it shows a statistically meaningful improvement in head-to-head results (Win Rate > 50%)
- it does not introduce an unacceptable runtime regression for the target environment (e.g. browser/JS vs server/JVM)
Reporting Format
Section titled “Reporting Format”Each experiment should produce a concise report:
================================================================================🎲♟️ Dice Chess Bot Arena - JVM Match RunnerBaseline Bot: [Name] (ID)Games per Color: [N] (Total [2N] games per match)================================================================================Opponent Bot | Total | Wins (W/B) | Losses (W/B) | Draws (W/B) | Win Rate | Time----------------------------------------------------------------------------------------[Bot A] | 1600 | ... | ... | ... | ...% | ...sThis makes search changes reviewable inside pull requests instead of relying on anecdotal observations.
Suggested Issue Decomposition
Section titled “Suggested Issue Decomposition”The future search engine tasks are decomposed as follows:
Expectimax search skeleton(Milestone v0.6)Star1 and Star2 pruning implementationZobrist Hashing & Transposition Table integrationStructured concurrency: parallelize chance-nodes using OxSearch evaluation reports and fixtures
Milestone Mapping
Section titled “Milestone Mapping”The recommended milestone mapping is:
- v0.5 - Evaluation & Heuristics
- Zobrist Hashing implementation
- Transposition Table structure
- v0.6 - Expectimax Search Engine
- Expectimax implementation and chance-node evaluations
- Star1/Star2 pruning
- Concurrency and parallel search improvements with
Ox
This keeps the project aligned with the approved milestones published in [AGENTS.md].