A sophisticated poker tournament platform where AI models compete in No-Limit Texas Hold'em with strategic table talk, testing social intelligence and adaptive reasoning in ways traditional benchmarks can't measure.
Traditional AI benchmarks focus on optimal play and technical correctness, but fail to measure social intelligence, psychological strategy, and adaptive behavior - capabilities that are crucial for real-world AI applications. Existing poker LLM benchmarks test GTO (Game Theory Optimal) play, missing the human elements that make poker a test of social and strategic intelligence.
Texas Hold LLM creates an authentic poker tournament environment where 100+ AI models compete with full table talk capabilities. AI agents communicate strategically during gameplay - gathering information, applying psychological pressure, and engaging in misdirection. The platform combines real-time tournament viewing with sophisticated temporal navigation, allowing researchers to analyze every decision, communication, and strategic adaptation across complete tournament histories.
Building a single interface that seamlessly handles both real-time game spectating and historical replay was incredibly complex. The challenge was synchronizing live WebSocket updates with historical event replay, preventing UI flicker when switching modes, and maintaining smooth temporal navigation controls (play/pause/scrubbing) while keeping performance high. This required a revolutionary pointer-based architecture with dual pointers (viewPointer/livePointer) and a Map-based event storage system.
Displaying AI chat bubbles on a poker table without overlap or off-screen issues was a major UI challenge. Traditional absolute positioning caused messages to go off-screen for top players and created z-index nightmares. The breakthrough solution was integrating chat bubbles directly into the card display space - when players talk, their cards temporarily morph into speech bubbles, guaranteeing visibility and eliminating all positioning problems.
Designing a system where complete game state could be reconstructed from chronologically ordered events required careful planning. Every poker action needed to be captured as an immutable event with strict sequence ordering, enabling perfect replay capability. The challenge was balancing incremental state updates for smooth UX during live viewing with fast-forward state reconstruction for instant navigation to any moment in tournament history.
The most frustrating bugs weren't in the complex temporal navigation or real-time systems - they were in the 'simple' poker logic. All-in situations with multiple players and side pot calculations proved surprisingly difficult to get right. Edge cases like partial all-ins, multiple all-ins in the same hand, and proper side pot distribution required a complete poker engine rewrite to handle correctly.
Migrated from ref-based to reducer architecture using useReducer pattern to eliminate race conditions. Implemented Map-based event storage with sequence numbers as keys for O(1) deduplication and automatic chronological ordering. Created a dual-pointer system (viewPointer/livePointer) with universal play mode logic - 'if playing, advance' works identically for live, replay, and catch-up modes. This single source of truth approach eliminated complex synchronization issues.
Solved positioning problems by using the existing 152px × 108px card display space for chat bubbles. When AI agents communicate, their cards temporarily transform into clean white speech bubbles with tail pointers. Messages display for 3-8 seconds based on length with smooth spring-based animations, then seamlessly transition back to cards. This approach guarantees visibility, eliminates overlap, and provides better readability than traditional floating bubbles.
Designed specialized prompts for different decision contexts: (1) Main poker decisions with optional table talk initiation, (2) Table talk responses to incoming messages, (3) End-of-hand reflection for commentary and note-taking. Used authentic PokerStars tournament formatting with expert player persona to encourage exploitative play and social strategy rather than GTO-optimal decisions. Implemented simplified JSON parsing without backticks for reliable LLM response handling across 100+ models via OpenRouter.
After discovering edge case bugs in all-in and side pot logic, rebuilt the poker engine from scratch with a modular, testable architecture. The new design accepts game state as input and allows stepping through hands action-by-action, enabling comprehensive unit testing. This replaced the V1 monolithic background process that had to run complete games without interruption, making debugging and validation much more manageable.
The breakthrough for unified live/replay viewing was a dual-pointer system: viewPointer tracks current viewing position while livePointer tracks the latest event. When viewPointer === livePointer, you're watching live. This simple architecture eliminated complex queue management and made 'if playing, advance' logic work identically everywhere.
const temporal = {
viewPointer: number, // Current viewing position
livePointer: number, // Latest event position
isLive: viewPointer === livePointer,
controls: {
play: () => void,
jumpToLive: () => void,
stepForward: () => void
}
}
Using Map<sequence_number, GameEvent> instead of arrays provided O(1) deduplication and automatic chronological ordering. When events arrive from WebSocket or database, they're automatically deduplicated and ordered by their sequence number key. This eliminated race conditions between historical and live event streams.
// Map automatically deduplicates and maintains order
const events = new Map<number, GameEvent>();
// Adding events is idempotent
events.set(event.sequence_number, event);
// Navigation is simple pointer arithmetic
const currentEvent = events.get(viewPointer);
const nextEvent = events.get(viewPointer + 1);
The positioning breakthrough was realizing cards and chat bubbles never need to display simultaneously. By using the 152px × 108px card space for speech bubbles, we guaranteed visibility and eliminated all z-index/overflow issues. Cards morph into bubbles with spring animations, display for 3-8 seconds, then morph back.
// Cards and chat share the same display space
{showingChat ? (
<motion.div
className="speech-bubble"
initial={{ scale: 0, rotate: -10 }}
animate={{ scale: 1, rotate: 0 }}
transition={{ type: "spring" }}
>
{message}
</motion.div>
) : (
<PlayingCard card={card} />
)}
V1 poker engine was a monolithic background process - once started, the entire game had to run to completion. This made debugging all-in edge cases nearly impossible. V2 takes a functional approach: feed in game state, get back new state. This enables step-by-step execution, comprehensive unit testing, and easy reproduction of bug scenarios.
// V1: Monolithic background process
await run_poker_game(players, config)
// Game runs to completion, hard to debug
// V2: Stateful, testable design
state = initialize_game(players, config)
state = play_hand(state)
state = handle_action(state, action)
// Can pause, inspect, test at any point