BotBeat
...
← Back

> ▌

AnthropicAnthropic
RESEARCHAnthropic2026-04-21

Anthropic's Haiku 4.5 with Skills Outperforms Opus 4.7 Without Skills in Comprehensive 880-Eval Benchmark

Key Takeaways

  • ▸Haiku 4.5 with skills (84.3%) outperformed Opus 4.7 baseline (80.5%), proving that smaller models with proper augmentation can beat frontier models without it
  • ▸All 88 tested configurations across 9 models showed positive performance lifts when skills were loaded, with gains ranging from +11.3 to +23.1 percentage points
  • ▸Weaker models benefited most from skills—Haiku gained 23.1 points while Opus 4.7 gained 14 points—suggesting skills are the key to cost-effective AI deployment
Source:
Hacker Newshttps://tessl.io/blog/anthropic-openai-or-cursor-model-for-your-agent-skills-7-learnings-from-running-880-evals-including-opus-47/↗

Summary

A comprehensive evaluation of nine AI models across 880 test cases reveals that agent skills have become a decisive factor in model performance, potentially outweighing raw model capability. Anthropic's Haiku 4.5 with skills loaded achieved an 84.3% success rate, surpassing Opus 4.7's baseline 80.5% performance—demonstrating that smaller, more cost-effective models can compete with frontier models when augmented with the right skills. The benchmark tested configurations from Anthropic (Opus 4.7, Opus 4.6, Sonnet 4.6, Haiku 4.5), OpenAI (three Codex variants), and Cursor's Composer-2, with every single configuration showing positive performance gains when skills were enabled. The research suggests that as agent skills become widespread across AI ecosystems in 2026, the strategic value of context development and skill optimization may eclipse the importance of selecting the largest or most expensive model.

  • The cost-performance math in AI systems is shifting from pure model selection to context optimization and skill development, a trend that will likely define competitive advantage in 2026

Editorial Opinion

This benchmark result reframes how engineering teams should approach AI model selection and deployment. Rather than pursuing the most powerful or expensive model as a default, organizations can achieve superior results by pairing mid-tier models like Haiku with well-crafted skills, dramatically improving both performance and cost efficiency. The finding that every single configuration improved with skills suggests we're witnessing a fundamental shift in AI development—from a model-centric paradigm to a context-centric one where the surrounding knowledge and capabilities matter as much as the base model.

Large Language Models (LLMs)AI AgentsMachine LearningMarket Trends

More from Anthropic

AnthropicAnthropic
INDUSTRY REPORT

The Fundamental Security Problem AI Creates: Why Open Source May Be Our Best Defense

2026-04-21
AnthropicAnthropic
RESEARCH

CodeRabbit Builds Planning Layer on Claude to Improve Code Review Accuracy

2026-04-21
AnthropicAnthropic
PARTNERSHIP

Anthropic Launches Claude Platform on AWS, Offering Native Developer Experience Through AWS Credentials and Billing

2026-04-21

Comments

Suggested

Google / AlphabetGoogle / Alphabet
POLICY & REGULATION

YouTube Warns EU and UK Prominence Rules Could Harm Independent Creators and Digital Economy

2026-04-21
OpenAIOpenAI
PRODUCT LAUNCH

Starbucks' ChatGPT Integration Proves More Cumbersome Than Traditional App Ordering

2026-04-21
MITMIT
PRODUCT LAUNCH

Mitshe Launches Open-Source AI Agent Platform with Isolated Docker Workspaces for Autonomous Development

2026-04-21
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us