BotBeat
...
← Back

> ▌

AnthropicAnthropic
RESEARCHAnthropic2026-03-25

Harvard Physicist Supervises Claude Through Complete Theoretical Physics Research Project in Two Weeks

Key Takeaways

  • ▸Claude Opus completed a full theoretical physics research project in two weeks with expert guidance, compared to the typical one-year timeline
  • ▸The project required 110+ drafts and 36M tokens, revealing that while Claude is capable of complex symbolic and mathematical reasoning, human domain expertise remains essential for accuracy validation
  • ▸This work demonstrates that LLMs can contribute meaningfully to frontier science but are not yet capable of fully autonomous end-to-end research without expert oversight
Source:
Hacker Newshttps://www.anthropic.com/research/vibe-physics↗

Summary

Harvard physics professor Matthew Schwartz conducted an unprecedented experiment in which he guided Claude Opus through a complete theoretical physics research calculation without touching any files himself, resulting in a technically rigorous high-energy physics paper in just two weeks—a process that typically takes a year. The project consumed over 110 draft iterations, 36 million tokens, and 40+ hours of CPU compute, demonstrating Claude's speed and persistence while also revealing significant limitations in accuracy that required expert human oversight throughout the process.

Schwartz emphasizes that while AI has not yet achieved autonomous end-to-end science, this work represents a significant milestone showing that large language models can now tackle frontier theoretical physics problems when guided by domain experts. The professor argues that the finding contradicts current AI science automation hype, suggesting that AI systems may need to "graduate school" before they can conduct independent Ph.D.-level research. Despite Claude's impressive technical capabilities, Schwartz found the AI made enough errors that constant expert validation was essential—a finding that highlights both the promise and persistent limitations of current AI systems in cutting-edge scientific work.

  • The achievement represents a significant capability leap from three months prior, suggesting rapid progress in AI's ability to handle symbolic, theoretical work beyond numerical pattern recognition

Editorial Opinion

This account offers a refreshingly honest assessment of AI's current scientific capabilities—neither dystopian hype nor dismissive skepticism, but pragmatic realism. Schwartz's finding that Claude requires constant expert validation actually strengthens the case for AI as a transformative research tool: it excels as a tireless collaborator for domain experts, not as an autonomous scientist. The implication that 'LLMs need to go to graduate school' before attempting independent Ph.D. work is an apt metaphor that should temper the recent flood of claims about autonomous AI scientists achieving breakthrough results.

Large Language Models (LLMs)AI AgentsDeep LearningScience & Research

More from Anthropic

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Security Researcher Exposes Critical Infrastructure After Following Claude's Configuration Advice Without Authentication

2026-04-05

Comments

Suggested

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us