BotBeat
...
← Back

> ▌

AnthropicAnthropic
RESEARCHAnthropic2026-04-21

Anthropic's Haiku 4.5 with Skills Outperforms Opus 4.7 Without Skills in Comprehensive 880-Eval Benchmark

Key Takeaways

  • ▸Haiku 4.5 with skills (84.3%) outperformed Opus 4.7 baseline (80.5%), proving that smaller models with proper augmentation can beat frontier models without it
  • ▸All 88 tested configurations across 9 models showed positive performance lifts when skills were loaded, with gains ranging from +11.3 to +23.1 percentage points
  • ▸Weaker models benefited most from skills—Haiku gained 23.1 points while Opus 4.7 gained 14 points—suggesting skills are the key to cost-effective AI deployment
Source:
Hacker Newshttps://tessl.io/blog/anthropic-openai-or-cursor-model-for-your-agent-skills-7-learnings-from-running-880-evals-including-opus-47/↗

Summary

A comprehensive evaluation of nine AI models across 880 test cases reveals that agent skills have become a decisive factor in model performance, potentially outweighing raw model capability. Anthropic's Haiku 4.5 with skills loaded achieved an 84.3% success rate, surpassing Opus 4.7's baseline 80.5% performance—demonstrating that smaller, more cost-effective models can compete with frontier models when augmented with the right skills. The benchmark tested configurations from Anthropic (Opus 4.7, Opus 4.6, Sonnet 4.6, Haiku 4.5), OpenAI (three Codex variants), and Cursor's Composer-2, with every single configuration showing positive performance gains when skills were enabled. The research suggests that as agent skills become widespread across AI ecosystems in 2026, the strategic value of context development and skill optimization may eclipse the importance of selecting the largest or most expensive model.

  • The cost-performance math in AI systems is shifting from pure model selection to context optimization and skill development, a trend that will likely define competitive advantage in 2026

Editorial Opinion

This benchmark result reframes how engineering teams should approach AI model selection and deployment. Rather than pursuing the most powerful or expensive model as a default, organizations can achieve superior results by pairing mid-tier models like Haiku with well-crafted skills, dramatically improving both performance and cost efficiency. The finding that every single configuration improved with skills suggests we're witnessing a fundamental shift in AI development—from a model-centric paradigm to a context-centric one where the surrounding knowledge and capabilities matter as much as the base model.

Large Language Models (LLMs)AI AgentsMachine LearningMarket Trends

More from Anthropic

AnthropicAnthropic
RESEARCH

Anthropic's Claude Matches Specialized Chemistry Software on NMR Analysis

2026-06-05
AnthropicAnthropic
RESEARCH

Miasma Worm Exploits AI Coding Agents, Targets 100+ GitHub Repositories

2026-06-05
AnthropicAnthropic
PRODUCT LAUNCH

Strava's MCP Launch Signals Model Context Protocol Has Reached Mainstream

2026-06-05

Comments

Suggested

OllamaOllama
RESEARCH

Critical Unpatched Vulnerabilities in Ollama Desktop App Enable Phishing and Data Exfiltration

2026-06-05
AnthropicAnthropic
RESEARCH

Anthropic's Claude Matches Specialized Chemistry Software on NMR Analysis

2026-06-05
Research CommunityResearch Community
RESEARCH

Researchers Demonstrate Autonomous LLM Agents for Photonic Chip Design

2026-06-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us