Texas Hold LLM | Projects | Will Diamond

>Problem & Solution

Problem

Traditional AI benchmarks focus on optimal play and technical correctness, but fail to measure social intelligence, psychological strategy, and adaptive behavior - capabilities that are crucial for real-world AI applications. Existing poker LLM benchmarks test GTO (Game Theory Optimal) play, missing the human elements that make poker a test of social and strategic intelligence.

Solution

Texas Hold LLM creates an authentic poker tournament environment where 100+ AI models compete with full table talk capabilities. AI agents communicate strategically during gameplay - gathering information, applying psychological pressure, and engaging in misdirection. The platform combines real-time tournament viewing with sophisticated temporal navigation, allowing researchers to analyze every decision, communication, and strategic adaptation across complete tournament histories.

>Challenges

Unified Live + Historical Viewing

Building a single interface that seamlessly handles both real-time game spectating and historical replay was incredibly complex. The challenge was synchronizing live WebSocket updates with historical event replay, preventing UI flicker when switching modes, and maintaining smooth temporal navigation controls (play/pause/scrubbing) while keeping performance high. This required a revolutionary pointer-based architecture with dual pointers (viewPointer/livePointer) and a Map-based event storage system.

Table Talk Positioning

Displaying AI chat bubbles on a poker table without overlap or off-screen issues was a major UI challenge. Traditional absolute positioning caused messages to go off-screen for top players and created z-index nightmares. The breakthrough solution was integrating chat bubbles directly into the card display space - when players talk, their cards temporarily morph into speech bubbles, guaranteeing visibility and eliminating all positioning problems.

Event Sourcing Architecture

Designing a system where complete game state could be reconstructed from chronologically ordered events required careful planning. Every poker action needed to be captured as an immutable event with strict sequence ordering, enabling perfect replay capability. The challenge was balancing incremental state updates for smooth UX during live viewing with fast-forward state reconstruction for instant navigation to any moment in tournament history.

Poker Engine All-In and Side Pot Logic

The most frustrating bugs weren't in the complex temporal navigation or real-time systems - they were in the 'simple' poker logic. All-in situations with multiple players and side pot calculations proved surprisingly difficult to get right. Edge cases like partial all-ins, multiple all-ins in the same hand, and proper side pot distribution required a complete poker engine rewrite to handle correctly.

>Approach

Reducer-Based Temporal Engine

Migrated from ref-based to reducer architecture using useReducer pattern to eliminate race conditions. Implemented Map-based event storage with sequence numbers as keys for O(1) deduplication and automatic chronological ordering. Created a dual-pointer system (viewPointer/livePointer) with universal play mode logic - 'if playing, advance' works identically for live, replay, and catch-up modes. This single source of truth approach eliminated complex synchronization issues.

Card-Space Chat Integration

Solved positioning problems by using the existing 152px × 108px card display space for chat bubbles. When AI agents communicate, their cards temporarily transform into clean white speech bubbles with tail pointers. Messages display for 3-8 seconds based on length with smooth spring-based animations, then seamlessly transition back to cards. This approach guarantees visibility, eliminates overlap, and provides better readability than traditional floating bubbles.

Three-Prompt LLM Architecture

Designed specialized prompts for different decision contexts: (1) Main poker decisions with optional table talk initiation, (2) Table talk responses to incoming messages, (3) End-of-hand reflection for commentary and note-taking. Used authentic PokerStars tournament formatting with expert player persona to encourage exploitative play and social strategy rather than GTO-optimal decisions. Implemented simplified JSON parsing without backticks for reliable LLM response handling across 100+ models via OpenRouter.

Modular Poker Engine V2 Rewrite

After discovering edge case bugs in all-in and side pot logic, rebuilt the poker engine from scratch with a modular, testable architecture. The new design accepts game state as input and allows stepping through hands action-by-action, enabling comprehensive unit testing. This replaced the V1 monolithic background process that had to run complete games without interruption, making debugging and validation much more manageable.

>Technical Insights

Dual-Pointer Temporal Navigation

The breakthrough for unified live/replay viewing was a dual-pointer system: viewPointer tracks current viewing position while livePointer tracks the latest event. When viewPointer === livePointer, you're watching live. This simple architecture eliminated complex queue management and made 'if playing, advance' logic work identically everywhere.

const temporal = {
  viewPointer: number,      // Current viewing position
  livePointer: number,      // Latest event position
  isLive: viewPointer === livePointer,
  controls: {
    play: () => void,
    jumpToLive: () => void,
    stepForward: () => void
  }
}

Map-Based Event Deduplication

Using Map<sequence_number, GameEvent> instead of arrays provided O(1) deduplication and automatic chronological ordering. When events arrive from WebSocket or database, they're automatically deduplicated and ordered by their sequence number key. This eliminated race conditions between historical and live event streams.

// Map automatically deduplicates and maintains order
const events = new Map<number, GameEvent>();

// Adding events is idempotent
events.set(event.sequence_number, event);

// Navigation is simple pointer arithmetic
const currentEvent = events.get(viewPointer);
const nextEvent = events.get(viewPointer + 1);

Card-Space Chat Bubble Integration

The positioning breakthrough was realizing cards and chat bubbles never need to display simultaneously. By using the 152px × 108px card space for speech bubbles, we guaranteed visibility and eliminated all z-index/overflow issues. Cards morph into bubbles with spring animations, display for 3-8 seconds, then morph back.

// Cards and chat share the same display space
{showingChat ? (
  <motion.div
    className="speech-bubble"
    initial={{ scale: 0, rotate: -10 }}
    animate={{ scale: 1, rotate: 0 }}
    transition={{ type: "spring" }}
  >
    {message}
  </motion.div>
) : (
  <PlayingCard card={card} />
)}

Stateful vs Stateless Poker Engine Design

V1 poker engine was a monolithic background process - once started, the entire game had to run to completion. This made debugging all-in edge cases nearly impossible. V2 takes a functional approach: feed in game state, get back new state. This enables step-by-step execution, comprehensive unit testing, and easy reproduction of bug scenarios.

// V1: Monolithic background process
await run_poker_game(players, config)
// Game runs to completion, hard to debug

// V2: Stateful, testable design  
state = initialize_game(players, config)
state = play_hand(state)
state = handle_action(state, action)
// Can pause, inspect, test at any point

>Project Gallery

IMAGE

>Technologies

Next.js 15

React 19

TypeScript

Tailwind CSS

Framer Motion

FastAPI

Python

Supabase

PostgreSQL

OpenRouter

WebSockets

Event Sourcing

>Results

Production-ready poker tournament platform with live viewing and complete historical replay capabilities
Successfully completed 5 AI vs AI tournaments testing the platform's core functionality
Event sourcing system handling 958 events with sub-250ms loading times and perfect state reconstruction
Unified temporal interface that works identically for live games, replays, and catch-up modes
Support for 100+ AI models through OpenRouter integration with simplified JSON parsing
Revolutionary card-space chat bubble system solving positioning challenges that plagued previous approaches
Comprehensive technical documentation spanning frontend and backend architecture

>Key Metrics

Event Loading Speed245 ms

Events Processed958

Memory Usage0.3 MB

State Reconstruction25.6 ms

AI Models Supported100 +

Performance vs Target8 x better

>Key Learnings

Event sourcing with strict sequence ordering is incredibly powerful for building replay systems - every game state can be perfectly reconstructed from immutable events
Reducer architecture with useReducer eliminates race conditions that plague ref-based state management in complex real-time systems
Sometimes the best UI solution is unconventional - using card space for chat bubbles solved positioning problems that traditional approaches couldn't
Specialized prompts for different contexts (poker decisions vs. table talk vs. reflection) produce much better AI behavior than single monolithic prompts
Map-based storage with sequence numbers as keys provides O(1) deduplication and automatic ordering - simpler and faster than array-based approaches
Real-time systems need graceful degradation - automatic fallback to live state on errors maintains user experience during edge cases
PokerStars hand history format provides authentic tournament context that helps LLMs make more realistic poker decisions
Performance validation is critical - measuring event loading (245ms for 958 events) and state reconstruction (max 25.6ms) confirmed the architecture could scale
The 'simple' parts aren't always simple - poker all-in and side pot logic has more edge cases than the complex temporal navigation system
Different AI models have dramatically different social behaviors - GPT-4o is far more likely to engage in table talk than other models, possibly reflecting its emotional intelligence training
Refs are not best practice for complex React state - migrating to reducer pattern made the codebase more maintainable and React-friendly
Performance concerns don't always materialize - loading 958 events into memory used only 0.3MB, proving the unified live/replay approach was viable
Monolithic background processes are hard to debug - rebuilding the poker engine with stateful, step-by-step execution made testing and bug fixing dramatically easier
Testability should be a first-class concern - the ability to feed in state and step through execution is worth a complete rewrite

>Texas Hold LLM