BotBeat
...
← Back

> ▌

AnthropicAnthropic
RESEARCHAnthropic2026-03-25

Harvard Physicist Supervises Claude Through Complete Theoretical Physics Research Project in Two Weeks

Key Takeaways

  • ▸Claude Opus completed a full theoretical physics research project in two weeks with expert guidance, compared to the typical one-year timeline
  • ▸The project required 110+ drafts and 36M tokens, revealing that while Claude is capable of complex symbolic and mathematical reasoning, human domain expertise remains essential for accuracy validation
  • ▸This work demonstrates that LLMs can contribute meaningfully to frontier science but are not yet capable of fully autonomous end-to-end research without expert oversight
Source:
Hacker Newshttps://www.anthropic.com/research/vibe-physics↗

Summary

Harvard physics professor Matthew Schwartz conducted an unprecedented experiment in which he guided Claude Opus through a complete theoretical physics research calculation without touching any files himself, resulting in a technically rigorous high-energy physics paper in just two weeks—a process that typically takes a year. The project consumed over 110 draft iterations, 36 million tokens, and 40+ hours of CPU compute, demonstrating Claude's speed and persistence while also revealing significant limitations in accuracy that required expert human oversight throughout the process.

Schwartz emphasizes that while AI has not yet achieved autonomous end-to-end science, this work represents a significant milestone showing that large language models can now tackle frontier theoretical physics problems when guided by domain experts. The professor argues that the finding contradicts current AI science automation hype, suggesting that AI systems may need to "graduate school" before they can conduct independent Ph.D.-level research. Despite Claude's impressive technical capabilities, Schwartz found the AI made enough errors that constant expert validation was essential—a finding that highlights both the promise and persistent limitations of current AI systems in cutting-edge scientific work.

  • The achievement represents a significant capability leap from three months prior, suggesting rapid progress in AI's ability to handle symbolic, theoretical work beyond numerical pattern recognition

Editorial Opinion

This account offers a refreshingly honest assessment of AI's current scientific capabilities—neither dystopian hype nor dismissive skepticism, but pragmatic realism. Schwartz's finding that Claude requires constant expert validation actually strengthens the case for AI as a transformative research tool: it excels as a tireless collaborator for domain experts, not as an autonomous scientist. The implication that 'LLMs need to go to graduate school' before attempting independent Ph.D. work is an apt metaphor that should temper the recent flood of claims about autonomous AI scientists achieving breakthrough results.

Large Language Models (LLMs)AI AgentsDeep LearningScience & Research

More from Anthropic

AnthropicAnthropic
RESEARCH

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

2026-07-04
AnthropicAnthropic
POLICY & REGULATION

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

2026-07-04
AnthropicAnthropic
RESEARCH

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

2026-07-03

Comments

Suggested

MicrosoftMicrosoft
RESEARCH

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

2026-07-04
Google / AlphabetGoogle / Alphabet
RESEARCH

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

2026-07-04
LLM Agent EcosystemLLM Agent Ecosystem
RESEARCH

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

2026-07-04
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us