What's new
2m ago Recomputed benchmark-weighted quality scores Refreshed the model quality layer that feeds ranking and comparison pages. 2m ago Synced Chatbot Arena benchmark track Updated the frontier conversation signal used in leaderboard weighting. 2m ago Updated speed measurements Refreshed output speed and latency references for tracked models. 2m ago Pulled latest OpenRouter price index Updated comparison data for providers and routed model endpoints. 2m ago Validated official pricing snapshots Rechecked provider pricing pages against the comparison database. 5h ago Jobs market snapshot refreshed 1,052 open roles across 10 tracked companies. 17d ago OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform Papers With Code featured in the latest daily brief. 17d ago Published the 2026-05-25 daily digest 7 stories captured from tracked sources. 51m ago Deezer’s new tool can identify AI music from Spotify, Apple Music, and others TechCrunch 2h ago Pool’s new app turns your screenshots into something useful TechCrunch 2m ago Recomputed benchmark-weighted quality scores Refreshed the model quality layer that feeds ranking and comparison pages. 2m ago Synced Chatbot Arena benchmark track Updated the frontier conversation signal used in leaderboard weighting. 2m ago Updated speed measurements Refreshed output speed and latency references for tracked models. 2m ago Pulled latest OpenRouter price index Updated comparison data for providers and routed model endpoints. 2m ago Validated official pricing snapshots Rechecked provider pricing pages against the comparison database. 5h ago Jobs market snapshot refreshed 1,052 open roles across 10 tracked companies. 17d ago OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform Papers With Code featured in the latest daily brief. 17d ago Published the 2026-05-25 daily digest 7 stories captured from tracked sources. 51m ago Deezer’s new tool can identify AI music from Spotify, Apple Music, and others TechCrunch 2h ago Pool’s new app turns your screenshots into something useful TechCrunch
Refreshed 2m ago

Frontier - composite leaderboard

12 of 97
# Model Provider AIRH Composite AIRH Real-World AIRH Value Delta 7d / Trend $ in / out Speed Ctx Tags
1 Llama 4 Maverick llama-4-maverick Meta 57.6 76.0 1,559 flat $0.15 / $0.60 95 1.0M openmultimodal
2 GPT-5.2 gpt-5.2 OpenAI 55.1 90.0 82 flat $1.75 / $14.00 85 400K multimodalapi
3 Claude Sonnet 4.6 claude-sonnet-4.6 Anthropic 55.0 86.0 72 flat $3.00 / $15.00 90 1M multimodalapi
4 Claude Opus 4.6 claude-opus-4.6 Anthropic 54.6 89.0 15 flat $15.00 / $75.00 50 1M multimodalapi
5 Llama 4 Scout llama-4-scout Meta 50.1 79.0 3,160 flat $0.10 / $0.30 120 10M openmultimodalfast
6 GPT-5.2 Pro gpt-5.2-pro OpenAI 44.1 93.0 7 flat $21.00 / $168.00 pending 400K multimodalapi
7 GPT-5 Pro gpt-5-pro OpenAI 43.0 90.0 10 flat $15.00 / $120.00 pending 400K multimodalapi
8 Claude Opus 4.5 claude-opus-4.5 Anthropic 41.2 86.0 43 flat $5.00 / $25.00 pending 200K multimodalapi
9 DeepSeek V3.2 deepseek-v3.2 DeepSeek 41.2 77.0 1,227 flat $0.20 / $0.77 49 131K open
10 O3 o3 OpenAI 39.8 88.0 135 flat $2.00 / $8.00 15 200K reasoningmultimodalapi
11 Gemini 3.1 Pro gemini-3.1-pro Google 38.2 96.0 101 flat $2.00 / $12.00 80 1.0M multimodalapi
12 Gemini 2.5 Pro gemini-2.5-pro Google 38.1 83.0 106 flat $1.25 / $10.00 90 1.0M multimodalapi
Showing 12 of 97 - sorted by AIRH Composite Open full leaderboard →

Pulse - current table

97ranked public LLM rows

37open-weight rows

32rows with speed data

Value leader: Mistral Nemo