Model rollouts, methodology updates, and evaluation milestones.
Public cost tracking
Every API dollar we spend is now published. See exactly what it costs to benchmark AI models independently.
28 models under continuous evaluation
The benchmark roster has expanded from 7 to 28 models across four providers, spanning open-weight to frontier.
Tier 1 model rollout
Daily evaluation is now running across seven Tier 1 models. Tier 2 and Tier 3 are next.
447 evaluation instances
The instance pool has grown to 447 across 9 families, 16 domains, and 32 patterns.