Skip to content

Architecture & Schema

This document details the database schema, data models, and the position deduplication strategy used in dicechess-analytics.


The system uses a highly normalized PostgreSQL schema designed to allow rapid analytics on moves and positions. Below is the Mermaid representation of the tables and their relations:

erDiagram
    players {
        uuid id PK
        string external_id UK
        string username
        string player_type
        jsonb metadata_json
        datetime created_at
        datetime updated_at
    }

    positions {
        bigint id PK
        string normalized_fen UK
        bigint fen_hash "IX"
        string piece_placement "IX"
        string active_color
        string castling
        string en_passant
    }

    games {
        uuid id PK
        string source
        uuid white_player_id FK
        uuid black_player_id FK
        integer white_rating
        integer black_rating
        integer time_initial_sec
        integer time_increment_sec
        integer initial_stake_amount
        integer final_stake_amount
        numeric white_money_delta
        numeric black_money_delta
        string stake_currency
        game_mode_enum mode
        smallint result
        game_termination_enum termination
        bigint initial_position_id FK
        bigint final_position_id FK
        smallint total_turns
        datetime started_at "IX"
        jsonb metadata_json
        datetime created_at
    }

    turns {
        bigint id PK
        uuid game_id FK
        smallint turn_number
        string active_color
        bigint position_id FK
        string dice_sorted
        array_string played_moves
        bigint position_after_id FK
        integer thinking_time_ms
    }

    game_events {
        bigint id PK
        uuid game_id FK
        smallint sequence_number
        smallint turn_number
        game_event_type_enum event_type
        string actor_color
        integer clock_white_ms
        integer clock_black_ms
        jsonb payload
    }

    players ||--o{ games : "plays as white/black"
    positions ||--o{ games : "starts/ends at"
    games ||--o{ turns : "contains"
    positions ||--o{ turns : "before/after state"
    games ||--o{ game_events : "emits"

A player profile — a human or an AI bot.

ColumnNotes
idUUID primary key (internal).
external_idUNIQUE. The player’s id at the originating source. This is the get-or-create key used on ingestion.
usernameDisplay name. Backed by a trigram GIN index (ix_players_username_trgm) for fast case-insensitive substring search.
player_typehuman or bot.
metadata_jsonRaw source-specific extras.
created_at / updated_atBookkeeping.

Ratings are not stored here — they are dynamic and per-game, so they live on games (white_rating / black_rating) as a snapshot at the time that game was played.

One row per distinct board state across all games — the heart of position analytics (it lets millions of turns collapse onto a minimal set of unique positions).

ColumnNotes
idBIGINT identity primary key.
normalized_fenUNIQUE. The 4-field normalized FEN — the deduplication key (see below).
fen_hashSigned BIGINT xxHash64 of normalized_fen, indexed (ix_positions_fen_hash) for fast equality lookups.
piece_placementThe placement field of the FEN (e.g. rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR), indexed to support board-pattern queries.
active_color, castling, en_passantThe remaining normalized-FEN fields, split out for querying.

One completed match.

ColumnNotes
idUUID primary key. Equal to the source’s game id — this is the idempotency key (re-ingesting the same game is a no-op).
sourceFree-form origin label of the writer.
white_player_id / black_player_idFK → players (nullable).
white_rating / black_ratingInteger rating snapshot at game time (nullable).
modegame_mode_enum — see Enumerated Types.
resultSMALLINT: 1 White win, -1 Black win, 0 draw, NULL unknown.
terminationgame_termination_enum, NOT NULL DEFAULT 'unknown'how the game ended.
initial_position_id / final_position_idFK → positions — the board at the start and at the end.
total_turnsNumber of turns.
time_initial_sec / time_increment_secTime control. Both are strictly in seconds — see Domain Conventions.
initial_stake_amount / final_stake_amountThe pot; the final amount can differ from the initial because of doubling.
white_money_delta / black_money_deltaNUMERIC actual profit/loss per player. Stored independently (not just ±x) because the site takes a rake, so the two deltas are not symmetric.
stake_currencyIn-game currency (GOLD), not an ISO code; an amount of 0 means a tournament game — see Domain Conventions.
started_atIndexed (ix_games_started_at).
metadata_jsonRaw source-specific extras.

One turn = a dice roll plus up to three micro-moves. UNIQUE (game_id, turn_number).

ColumnNotes
game_idFK → games (ON DELETE CASCADE).
turn_number1-based, ordered within the game.
active_colorw or b — who is to move.
position_idFK → positions: the board before the turn.
dice_sortedVARCHAR(3). The three rolled dice (piece types), stored as strictly 3 sorted piece letters cased by the side to move — white BPQ, black bpq. The ingest API accepts the dice as numeric codes (1=pawn … 6=king) and normalizes them to this letter form before storing; a CHECK constraint enforces letters only. See Domain Conventions.
played_movesVARCHAR(5)[] of UCI micro-moves, e.g. ["e2e4","g1f3"]. Up to three; empty when the player had no legal move.
position_after_idFK → positions: the board after the turn. Positions are recorded at turn boundaries, not per micro-move.
thinking_time_msTime spent on the turn.

A side-ledger of non-move events (doubling, draw offers). UNIQUE (game_id, sequence_number).

ColumnNotes
game_idFK → games (ON DELETE CASCADE).
sequence_numberOrder within the game.
turn_numberThe turn the event occurred on (nullable).
event_typegame_event_type_enum — see below.
actor_colorw / b — who triggered it.
clock_white_ms / clock_black_msClocks at the moment of the event.
payloadJSONB, e.g. {"bank": 800} for a doubling event.

Small, closed value sets are modelled as PostgreSQL enums for integrity. Writers map their source’s vocabulary onto these canonical values (keeping the schema source-agnostic).

ValueMeaning
classicStandard game.
x2Commercial game with doubling enabled.

How a game ended. Defaults to unknown; it is known precisely for engine-driven games and inferred for historical imports (e.g. a king-capture on the last move → king_captured).

ValueMeaning
king_capturedThe opponent’s king was captured — the Dice Chess win condition.
timeoutA player’s clock reached zero.
resignA player resigned.
draw_agreementA draw was agreed by both players.
double_declinedA doubling offer was declined, ending the game as a loss.
unknownNot determined (default; all legacy rows).
ValueMeaning
DOUBLE_OFFERA player proposes to double the stake.
DOUBLE_ACCEPTThe opponent accepts; payload carries the new bank (e.g. {"bank": 800}).
DOUBLE_DECLINEThe opponent declines — an immediate loss.
DRAW_OFFERA player offers a draw.
DRAW_ACCEPTThe opponent accepts the draw.

The schema is managed by Flyway (src/main/resources/db/migration/). V1 is the baseline that mirrors the original production schema (created by the previous app’s migrations); the production database is baselined at V1 (baselineOnMigrate), so on production only V2 onward run, while fresh databases (tests, new environments) get the full chain. V2 converted mode / termination to the enums above.


To prevent storing the same board layout multiple times (which would quickly bloat the database), dicechess-analytics normalizes and hashes FEN strings before inserting them.

A standard chess FEN string contains 6 fields: [piece placement] [active color] [castling rights] [en passant target] [halfmove clock] [fullmove number]

For deduplication, the halfmove clock and fullmove number are stripped, since they do not change the tactical properties of the position. The normalized FEN only contains the first 4 fields:

def normalizeFen(fen: String): String =
// filter: unlike Python's str.split(), Java's split keeps empty
// elements produced by leading or consecutive whitespace
val parts = fen.split("\\s+").filter(_.nonEmpty).take(4)
(parts ++ Array.fill(4 - parts.length)("-")).mkString(" ")

FEN parsing and validation themselves are the engine’s job: dicechess-engine-scala is the single source of truth for game rules, and the backend consumes it as a JVM library rather than re-implementing any chess logic.

We compute the hash of the normalized FEN with xxHash64 (seed 0) — extremely fast, very low collision rate. PostgreSQL’s bigint is a signed 64-bit integer — exactly the JVM’s Long — so the digest’s raw bits map onto the column directly, with no unsigned-to-signed conversion step:

import net.openhft.hashing.LongHashFunction
def fenHash(normalizedFen: String): Long =
LongHashFunction.xx().hashBytes(normalizedFen.getBytes(StandardCharsets.UTF_8))

This is bit-compatible with the digests already stored for the historical data (originally produced by Python’s xxhash.xxh64(...).intdigest()), verified by cross-checking sample rows — so old and new positions share one consistent fen_hash space.

When saving a turn or game, the backend follows a “get-or-create” loop (the historical import already populated the positions table this way; the POST /api/games endpoint applies the same flow to live games):

  1. Normalize FEN: Strip move counts.
  2. Lookup by Normalized FEN: Query the positions table using the unique normalized_fen field.
  3. Insert if Missing: If the position is not in the database, calculate its signed xxhash64 hash and insert the new Position record.
  4. Reference ID: Use the resolved position ID as the foreign key in turns and games.