Harvard Physics Professor Guides Claude Through Frontier Research: AI Completes Year-Long Physics Calculation in Two Weeks
Key Takeaways
- ▸Claude Opus 4.5 successfully completed a full theoretical physics research cycle, producing publication-ready work in a fraction of the typical timeline
- ▸AI systems show particular promise for symbolic work (mathematical expression manipulation) rather than purely data-driven tasks, positioning them as potential graduate-level research assistants
- ▸Domain expertise remains critical for validating AI-generated scientific work, indicating a human-in-the-loop model rather than fully autonomous research is currently optimal
Summary
In a groundbreaking collaboration, Harvard physics professor Matthew Schwartz supervised Claude Opus 4.5 through a complete theoretical physics research calculation without manually touching any files himself, demonstrating that AI can contribute meaningfully to frontier science. The project produced a technically rigorous high-energy theoretical physics paper in two weeks—a timeline that typically requires a year—using 110 separate drafts, 36 million tokens, and over 40 hours of local CPU compute. While Claude proved fast, tireless, and highly capable at manipulating mathematical expressions and writing code, Schwartz found that domain expertise remained essential for evaluating accuracy, revealing both the promise and limitations of current AI systems in scientific research. The accomplishment suggests that large language models may be transitioning from theoretical curiosities to genuine research collaborators, though not yet at the fully autonomous, end-to-end level that recent AI scientist projects claim to achieve.
- The achievement demonstrates that LLMs may need to develop intermediate capabilities before attempting fully autonomous end-to-end science
Editorial Opinion
This result is genuinely significant not because Claude achieved complete scientific autonomy—it didn't—but because it shows AI can meaningfully collaborate with human experts on frontier research at scale. Schwartz's honest assessment that AI proved 'sloppy' yet capable suggests the field is moving beyond hype toward realistic evaluation. The two-week timeline for work that typically takes a year could fundamentally reshape how theoretical research is conducted, provided the field develops better validation frameworks.


