Harvard Physicist Supervises Claude Through Complete Theoretical Physics Research Project in Two Weeks
Key Takeaways
- ▸Claude Opus completed a full theoretical physics research project in two weeks with expert guidance, compared to the typical one-year timeline
- ▸The project required 110+ drafts and 36M tokens, revealing that while Claude is capable of complex symbolic and mathematical reasoning, human domain expertise remains essential for accuracy validation
- ▸This work demonstrates that LLMs can contribute meaningfully to frontier science but are not yet capable of fully autonomous end-to-end research without expert oversight
Summary
Harvard physics professor Matthew Schwartz conducted an unprecedented experiment in which he guided Claude Opus through a complete theoretical physics research calculation without touching any files himself, resulting in a technically rigorous high-energy physics paper in just two weeks—a process that typically takes a year. The project consumed over 110 draft iterations, 36 million tokens, and 40+ hours of CPU compute, demonstrating Claude's speed and persistence while also revealing significant limitations in accuracy that required expert human oversight throughout the process.
Schwartz emphasizes that while AI has not yet achieved autonomous end-to-end science, this work represents a significant milestone showing that large language models can now tackle frontier theoretical physics problems when guided by domain experts. The professor argues that the finding contradicts current AI science automation hype, suggesting that AI systems may need to "graduate school" before they can conduct independent Ph.D.-level research. Despite Claude's impressive technical capabilities, Schwartz found the AI made enough errors that constant expert validation was essential—a finding that highlights both the promise and persistent limitations of current AI systems in cutting-edge scientific work.
- The achievement represents a significant capability leap from three months prior, suggesting rapid progress in AI's ability to handle symbolic, theoretical work beyond numerical pattern recognition
Editorial Opinion
This account offers a refreshingly honest assessment of AI's current scientific capabilities—neither dystopian hype nor dismissive skepticism, but pragmatic realism. Schwartz's finding that Claude requires constant expert validation actually strengthens the case for AI as a transformative research tool: it excels as a tireless collaborator for domain experts, not as an autonomous scientist. The implication that 'LLMs need to go to graduate school' before attempting independent Ph.D. work is an apt metaphor that should temper the recent flood of claims about autonomous AI scientists achieving breakthrough results.


