AiScientist: New System Enables Autonomous Long-Horizon ML Research Engineering
Key Takeaways
- ▸AiScientist enables autonomous agents to conduct complex, multi-day ML research engineering tasks through hierarchical orchestration and structured state management
- ▸The File-as-Bus workspace architecture, which uses durable artifacts as a coordination mechanism, proved to be the key performance driver
- ▸Long-horizon autonomous research is reframed as a systems problem of coordinating specialized work over persistent project state rather than a local reasoning problem
Summary
Researchers have introduced AiScientist, a novel system designed to enable autonomous AI agents to conduct complex, long-horizon ML research engineering tasks that span multiple days. The system addresses a critical challenge in autonomous research: maintaining coherent progress across interconnected stages including task comprehension, environment setup, implementation, experimentation, and debugging. AiScientist combines hierarchical orchestration with a "File-as-Bus" workspace architecture, where a top-level Orchestrator maintains control through summaries and workspace maps while specialized agents ground themselves on durable artifacts like analyses, plans, code, and experimental evidence rather than relying on conversational handoffs.
The approach demonstrates significant performance improvements across two complementary benchmarks: AiScientist improved PaperBench scores by 10.54 points on average over baseline systems and achieved 81.82% on MLE-Bench Lite. Ablation studies revealed that the File-as-Bus protocol is crucial to performance, with its removal resulting in substantial score reductions. This research suggests that long-horizon ML research engineering is fundamentally a systems coordination problem centered on managing durable project state rather than a pure reasoning challenge.
- Performance improvements of 10.54 points on PaperBench and 81.82% on MLE-Bench Lite demonstrate practical viability of the approach
Editorial Opinion
AiScientist represents a meaningful shift in how we approach autonomous ML research—moving beyond conversation-based handoffs to durable artifact-centered coordination. The insight that long-horizon research engineering is fundamentally a systems problem rather than a reasoning problem could reshape how we design AI research assistants and suggests practical pathways toward truly autonomous scientific discovery systems.



