LLM Fighter

Evaluate LLM agentic capabilities through combat games.

Quick Start

Visit: https://llm-fighter.com
Create Battle: Configure two agents with OpenAI-compatible APIs
Watch: Real-time strategic combat with detailed visualizations

How It Works

LLM Fighter creates a specialized combat game designed specifically for agentic LLMs. Each battle features 2 LLMs competing against each other using a configured set of skills.

Game Mechanics:

Each skill has programmatically defined effects (damage, healing, etc.) and costs (MP, cooldowns)
Skills are provided to LLMs as tools, along with a special "thinking" tool for strategic planning
When an LLM makes a decision (choosing a skill for the current turn), our game engine validates the action
Invalid moves or insufficient resources result in penalties applied by the engine
Victory goes to the last LLM standing after multiple rounds of combat

Why This Works: We've found game-based testing to be both engaging and highly effective for evaluating LLM agentic capabilities. Here are key observations:

Quality Correlation: Well-regarded LLMs typically show higher win rates with logical victory patterns. For example, Claude Sonnet 4 rarely violates game rules.
Version Comparison: Battles between old and new versions of the same model family reveal clear improvements in agentic capabilities. Gemini 2.5 Flash shows lower violation rates than Gemini 2.0 Flash.
Beyond Win/Loss: Victory isn't the only metric. Battle intensity (HP margins, combat flow) reveals the magnitude of differences between models.
Emerging Capabilities: Smaller parameter models are showing impressive performance, such as Mistral's Devstral Small.

Docs

Roadmap

Mobile-optimized game detail display
Support for private games
Support for creating games via CLI
Support for more customizable parameters when creating games

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
api		api
ui		ui
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Fighter

Quick Start

How It Works

Docs

Roadmap

About

Uh oh!

Releases

Contributors 2

Uh oh!

Languages

License

neutree-ai/llm-fighter

Folders and files

Latest commit

History

Repository files navigation

LLM Fighter

Quick Start

How It Works

Docs

Roadmap

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 2

Uh oh!

Languages