>_ AI CODING HARNESS LEADERBOARDS

loading snapshot...
cross-harness, cross-model agentic-coding scores

Same model, different harnesses, very different results. The harness owns context curation, tool design, retry policy, and verifier integration, so a Claude Opus 4.7 inside Claude Code can score 20+ points higher on Terminal-Bench than the same model called by Aider with default settings. This page tracks the harness gap as a first-class number. Snapshots are aggregated weekly from public leaderboards. We do not re-run benchmarks; each row links to the upstream report.

fetching harness data...

WHAT IS A HARNESS?

The harness is the agent scaffold that wraps a base language model: the system prompt, the tools, the file-edit format, the retry and verifier loop, the context-management strategy, and the way errors are surfaced back to the model.

FREE API
GET /api/harnesses
Returns the same dataset as JSON. Pass ?view=summary for top combined leaderboard and biggest harness gaps, ?view=gaps for full harness-gap analysis, ?view=combined for normalized cross-benchmark ranking, or no param for the raw benchmark graph. 12-hour cache. CORS enabled. No auth, no key, no signup.
Also available as MCP tool tf_harnesses via /api/mcp and as a function-calling tool definition at /api/llm-tools.