BotBeat
...
← Back

> ▌

AnthropicAnthropic
RESEARCHAnthropic2026-04-17

AiScientist: New System Enables Autonomous Long-Horizon ML Research Engineering

Key Takeaways

  • ▸AiScientist enables AI agents to autonomously conduct extended ML research tasks spanning multiple hours or days while maintaining coherent progress
  • ▸The File-as-Bus workspace architecture using durable artifacts (code, analyses, experimental evidence) is the key innovation, reducing reliance on conversational context handoffs
  • ▸System achieves significant benchmark improvements: 10.54 point average gain on PaperBench and 81.82% performance on MLE-Bench Lite
Source:
Hacker Newshttps://arxiv.org/abs/2604.13018↗

Summary

Researchers have introduced AiScientist, a novel system designed to enable AI agents to independently conduct long-horizon machine learning research engineering tasks that span hours or days. The system addresses a critical challenge in autonomous research: maintaining coherent progress across complex workflows including task comprehension, environment setup, implementation, experimentation, and debugging without losing context or state.

AiScientist combines hierarchical orchestration with an innovative "File-as-Bus" workspace architecture. A top-level Orchestrator maintains stage-level control through structured summaries and workspace maps, while specialized agents re-ground on durable artifacts—analyses, plans, code, and experimental evidence—rather than relying on conversational handoffs. This approach prioritizes persistent state management over purely conversational reasoning, effectively creating what the researchers call "thin control over thick state."

The system demonstrates significant performance improvements across two complementary benchmarks: improving PaperBench scores by 10.54 points on average over baseline systems and achieving 81.82% on MLE-Bench Lite. Ablation studies confirm that the File-as-Bus protocol is crucial, with its removal causing substantial performance degradation, suggesting that long-horizon ML research engineering is fundamentally a systems coordination problem rather than a pure reasoning challenge.

  • Long-horizon ML research engineering is reframed as a systems problem of coordinating specialized work over persistent project state, not just a local reasoning challenge

Editorial Opinion

AiScientist represents a meaningful step forward in autonomous AI research capabilities, shifting focus from purely conversational AI reasoning to structured state management. The emphasis on durable artifacts and hierarchical orchestration suggests that scaling AI agents to tackle complex, extended research tasks requires rethinking architecture around persistent workspace design rather than just improving language model capabilities. This approach could have broad implications for AI systems operating in other domains requiring sustained, multi-phase project work.

Reinforcement LearningAI AgentsMachine LearningScience & Research

More from Anthropic

AnthropicAnthropic
PRODUCT LAUNCH

Anthropic Launches Claude Design: AI-Powered Prototyping and Visual Creation Tool

2026-04-17
AnthropicAnthropic
RESEARCH

Public AI Models Can Reproduce Anthropic's Advanced Vulnerability Research, Study Finds

2026-04-17
AnthropicAnthropic
RESEARCH

Developer Audits 9,667 Claude Code Sessions, Discovers Token Waste Management Strategy Costing $19

2026-04-17

Comments

Suggested

MoodleMoodle
PRODUCT LAUNCH

Moodle's Open Architecture Enables Detection of AI Agents in Learning Environments

2026-04-17
CloudflareCloudflare
PRODUCT LAUNCH

Cloudflare Launches Flagship: Feature Flags Purpose-Built for AI Agents and Edge Computing

2026-04-17
Industry-WideIndustry-Wide
INDUSTRY REPORT

Enterprise Chatbots Face 'Token Freeloader' Attacks as Users Exploit Systems for Unauthorized AI Computation

2026-04-17
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us