BotBeat
...
← Back

> ▌

AnthropicAnthropic
RESEARCHAnthropic2026-04-21

Anthropic's Haiku 4.5 with Skills Outperforms Opus 4.7 Without Skills in Comprehensive 880-Eval Benchmark

Key Takeaways

  • ▸Haiku 4.5 with skills (84.3%) outperformed Opus 4.7 baseline (80.5%), proving that smaller models with proper augmentation can beat frontier models without it
  • ▸All 88 tested configurations across 9 models showed positive performance lifts when skills were loaded, with gains ranging from +11.3 to +23.1 percentage points
  • ▸Weaker models benefited most from skills—Haiku gained 23.1 points while Opus 4.7 gained 14 points—suggesting skills are the key to cost-effective AI deployment
Source:
Hacker Newshttps://tessl.io/blog/anthropic-openai-or-cursor-model-for-your-agent-skills-7-learnings-from-running-880-evals-including-opus-47/↗

Summary

A comprehensive evaluation of nine AI models across 880 test cases reveals that agent skills have become a decisive factor in model performance, potentially outweighing raw model capability. Anthropic's Haiku 4.5 with skills loaded achieved an 84.3% success rate, surpassing Opus 4.7's baseline 80.5% performance—demonstrating that smaller, more cost-effective models can compete with frontier models when augmented with the right skills. The benchmark tested configurations from Anthropic (Opus 4.7, Opus 4.6, Sonnet 4.6, Haiku 4.5), OpenAI (three Codex variants), and Cursor's Composer-2, with every single configuration showing positive performance gains when skills were enabled. The research suggests that as agent skills become widespread across AI ecosystems in 2026, the strategic value of context development and skill optimization may eclipse the importance of selecting the largest or most expensive model.

  • The cost-performance math in AI systems is shifting from pure model selection to context optimization and skill development, a trend that will likely define competitive advantage in 2026

Editorial Opinion

This benchmark result reframes how engineering teams should approach AI model selection and deployment. Rather than pursuing the most powerful or expensive model as a default, organizations can achieve superior results by pairing mid-tier models like Haiku with well-crafted skills, dramatically improving both performance and cost efficiency. The finding that every single configuration improved with skills suggests we're witnessing a fundamental shift in AI development—from a model-centric paradigm to a context-centric one where the surrounding knowledge and capabilities matter as much as the base model.

Large Language Models (LLMs)AI AgentsMachine LearningMarket Trends

More from Anthropic

AnthropicAnthropic
PRODUCT LAUNCH

Phoenix Code Launches Claude AI Integration with Free and Pro Tiers

2026-06-05
AnthropicAnthropic
RESEARCH

Anthropic Publishes First Research on Claude as Chemistry Assistant

2026-06-05
AnthropicAnthropic
RESEARCH

Anthropic's Claude Matches Specialized Chemistry Software on NMR Analysis

2026-06-05

Comments

Suggested

AnthropicAnthropic
PRODUCT LAUNCH

Phoenix Code Launches Claude AI Integration with Free and Pro Tiers

2026-06-05
OpenAIOpenAI
POLICY & REGULATION

OpenAI Proposes Federal AI Safety Framework Centered on Recursive Self-Improvement

2026-06-05
AnthropicAnthropic
RESEARCH

Anthropic Publishes First Research on Claude as Chemistry Assistant

2026-06-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us