Search Roadmap & Evaluation

The engine currently has a complete 5-level primitive bot roster: RandomSearch, CheckmateAwareSearch, GreedySearch (baseline), GreedySearchV2 (Cautious Greedy), and AggressiveSearch. Level 6 is the non-primitive MonteCarloSearch bot.

This forms a solid baseline for move evaluation, but single-turn bots are limited to a 1-turn horizon. To develop a stronger, grandmaster-level engine, we need to transition from single-turn heuristics to a deep, probabilistic search tree.

This document defines:

the revised implementation roadmap for deep search algorithms and optimizations
the evaluation protocol that every new search algorithm must pass before it replaces the baseline

Goals

The search roadmap should improve move quality without losing the properties that already matter:

legal move selection under the maximum micro-moves rule
deterministic and reproducible testing
predictable runtime for browser and JVM usage
a clear migration path from simple heuristics to probabilistic tree search

Future changes should therefore be judged on two axes:

playing strength against the current baseline (GreedySearch or the latest optimized bot)
cost in runtime, implementation complexity, and memory footprint (minimizing garbage collection on the hot path)

Baseline

The current baseline is GreedySearch (Level 3).

While primitive, it has several advantages:

it reasons over the full Dice Chess turn, not a single micro-move
it is deterministic
it is easy to explain and debug
it creates a stable reference point for head-to-head testing

The baseline remains available even after stronger algorithms are added. It serves as the control group for all future experiments.

Revised Roadmap

The search roadmap is divided into progressive optimization and tree-search stages.

Stage 1: Deep Expectimax Search

This stage introduces depth-first traversal of multiple plies (turns), reasoning about future turns.

Expected characteristics:

Chance Nodes: Representing the stochastic dice roll outcomes (216 ordered rolls / 56 unique multisets) of both sides.
Decision Nodes: Selecting the optimal full-turn path (1-3 micro-moves) for the active player.
Bounded Depth: Dynamically adjusted search depth based on remaining time controls.

Milestone fit:

primarily v0.6 - Expectimax Search Engine

Stage 2: Search Optimizations (Star1 & Star2 Pruning)

Traversing a deep expectimax tree scales exponentially. We must implement advanced alpha-beta pruning extensions for games with chance nodes:

Star1/Star2 Pruning: Prunes chance nodes by calculating bounds on the mathematical expectation and skipping subtrees that cannot affect the optimal choice.
Zobrist Hashing & Transposition Tables (TT): Caches previously computed node values, bounds, and search depths. Zobrist keys must include the board position, active color, and remaining dice pool.

Milestone fit:

primarily v0.6 - Expectimax Search Engine and v0.5 (Zobrist/TT groundwork)

Stage 3: Parallel Search with Ox Concurrency

To utilize multi-core server processors (such as the 4-core ARM nodes on Oracle Cloud), we will parallelize the evaluation of chance-node subtrees:

Use Java Virtual Threads (via the Ox structured concurrency library) to spawn lightweight concurrent branch evaluations.
Ensure thread-safe read operations on transposition tables.
Implement structured cancellation to stop running threads immediately when a beta-cutoff is triggered or when search time expires.

Milestone fit:

primarily v0.6 - Expectimax Search Engine

Stage 4: Monte-Carlo Pre-Roll Equity

Parallel to the exact tree search, a Rao-Blackwellized Monte-Carlo estimator gives an on-demand pre-roll win-probability estimate for positions too sparse in the games database to read empirically. It integrates the exact per-ply king-capture probability (KingCaptureProbability) along a random rollout weighted by survival, which cuts variance sharply versus vanilla 0/1 rollouts. It reuses TurnGenerator + the RandomSearch policy and exposes a configurable budget (rollouts / target CI width / ply horizon).

See Monte-Carlo Pre-Roll Equity for the algorithm, the variance rationale, and budgeting. It complements position canonicalization (which pools empirical statistics across symmetric positions) for genuinely off-book positions, and shares the KingCaptureProbability machinery with the expectimax chance-node evaluation.

The estimator also drives a bot: MonteCarloSearch (Level 6, registered in BotRegistry) scores every legal turn by the Monte-Carlo win probability of the resulting position and plays the best, preferring an immediate king capture. It is the first non-primitive bot (rollout-based lookahead rather than a one-ply heuristic). Per-move cost scales with the number of legal turns × the rollout budget, so a multi-game win-rate match is validated offline in the JVM Battle Arena — not in CI.

Time control. Because per-move cost grows with the branching factor, an unbounded heavy bot loses on the clock in complex positions. MonteCarloSearch therefore takes a wall-clock deadline: it takes any immediate king capture for free, otherwise ranks turns by a cheap material score, keeps the top K, and Monte-Carlo-evaluates them within an equal slice of the remaining time, always falling back to the best material turn so a legal move is returned even if the deadline elapses first. The caller allocates the game clock into per-move deadlines. The budget paths (rollouts / target-error) stay deterministic for tests; only the wall-clock path is non-deterministic. The acceptance gate is whether time-limited Monte-Carlo beats AggressiveSearch (L5) within a one-minute game budget — otherwise the heavy search is not worth it over the empirical-statistics path.

Milestone fit:

feeds the analytics equity guidance now; aligns with v0.6 - Expectimax Search Engine machinery.

Evaluation Pyramid

Every new algorithm or optimization must be validated at three levels.

Level 1: Unit Correctness

These tests prove that the algorithm respects core game semantics and obvious tactical priorities.

immediate king capture is always preferred over any non-terminal material sequence
the maximum micro-moves rule is never violated
castling still consumes the correct dice and is scored correctly
promotion branches are ranked consistently

This level protects correctness and prevents regressions that would otherwise be hidden inside large simulations.

Level 2: Scenario Suite

This suite is a curated set of fixed positions designed to stress the evaluator and shallow search logic.

immediate king capture
unblock-and-capture
promotion race
king exposure after material gain
sacrificial trap
high-mobility versus low-mobility positions
symmetric positions used to verify deterministic tie-breaking

Each scenario defines the FEN, dice multiset, expected preferred line, and the rationale. This suite must remain small enough to run in every local mise run check.

Level 3: Mass Simulation (Bot Arena)

This is the acceptance layer for comparing playing strength. Each challenger algorithm plays a large head-to-head match against the baseline:

baseline as White, challenger as Black
challenger as White, baseline as Black
fixed seed lists for reproducibility
repeated across multiple starting positions, tactical middlegames, and simplified endgames

Match Protocol

The benchmark harness should follow a stable protocol so that results remain comparable over time.

Required controls

deterministic PRNG seeding
explicit algorithm identity and Git commit hash in output reports
symmetric color assignment (equal games as White and Black)
turn/time budget settings recorded in the report
fixed resignation, repetition, and 50-move limit rules

Required metrics

At minimum, collect:

win, loss, and draw rates
average game length (turns)
average decision time per turn
average number of candidate paths or nodes evaluated
runtime distribution (percentiles), not just mean runtime

Acceptance criteria

A challenger should replace the baseline only when all of the following are true:

it passes unit correctness and scenario-suite checks
it shows a statistically meaningful improvement in head-to-head results (Win Rate > 50%)
it does not introduce an unacceptable runtime regression for the target environment (e.g. browser/JS vs server/JVM)

Reporting Format

Each experiment should produce a concise report:

================================================================================
🎲♟️  Dice Chess Bot Arena - JVM Match Runner
Baseline Bot: [Name] (ID)
Games per Color: [N] (Total [2N] games per match)
================================================================================
Opponent Bot    | Total | Wins (W/B)   | Losses (W/B) | Draws (W/B)  | Win Rate | Time
----------------------------------------------------------------------------------------
[Bot A]         | 1600  | ...          | ...          | ...          |    ...%  | ...s

This makes search changes reviewable inside pull requests instead of relying on anecdotal observations.

Suggested Issue Decomposition

The future search engine tasks are decomposed as follows:

Expectimax search skeleton (Milestone v0.6)
Star1 and Star2 pruning implementation
Zobrist Hashing & Transposition Table integration
Structured concurrency: parallelize chance-nodes using Ox
Search evaluation reports and fixtures

Milestone Mapping

The recommended milestone mapping is:

v0.5 - Evaluation & Heuristics
- Zobrist Hashing implementation
- Transposition Table structure
v0.6 - Expectimax Search Engine
- Expectimax implementation and chance-node evaluations
- Star1/Star2 pruning
- Concurrency and parallel search improvements with Ox

This keeps the project aligned with the approved milestones published in [AGENTS.md].