Law Professors Find AI Tutors Dramatically Outperform Peer Answers in Legal Education

Key Takeaways

▸LLMs rated 75.33% higher than peer tutoring in blind legal education evaluations
▸AI responses flagged as harmful far less often than peer answers (3.53% vs 12.06%)
▸LLM tutors performed comparably to the best human instructors in the study

Source:

Hacker Newshttps://law.stanford.edu/publications/law-professors-prefer-ai-over-peer-answers/↗

Summary

A landmark study conducted by 16 U.S. law professors from Stanford Law School and partner institutions has found that large language models significantly outperform human peer tutoring in legal education. In a blinded evaluation of contracts courses, professors created 40 representative questions, provided model answers, and judged 2,918 anonymized comparisons between LLM responses and answers from their peers. The results decisively favored AI: LLMs received an average win rate of 75.33% compared to peer answers, with models performing at levels comparable to the best human instructors in the study.

Beyond raw performance, the research revealed that LLM responses were rarely flagged as harmful or problematic (3.53% of cases), compared to 12.06% for peer-provided answers—suggesting AI tutors produce more consistent, appropriate guidance. The professors' preferences remained uniform across evaluators, indicating the advantage reflected shared professional standards rather than individual bias.

Crucially, the researchers demonstrated that expert preferences could be scaled using AI-as-judge approaches, making it practical to evaluate new models without repeated expert review. This methodology could extend to other judgment-heavy domains beyond law, where a single ground truth doesn't exist but professional expertise can reliably assess quality.

AI-as-judge methodology enables scalable evaluation across multiple models without repeated expert review

Editorial Opinion

This research challenges the assumption that AI excels only in narrow, fact-based domains. The finding that LLMs outperform human peers at legal reasoning—a domain requiring nuance, judgment, and argumentation—suggests AI tutoring could address real gaps in professional education access and consistency. However, the study raises equally important questions: Should human expertise be supplemented or supplemented by AI? And what happens to the educational value of peer-to-peer learning if AI becomes the default tutor?

Anthropic

RESEARCH Anthropic2026-06-06

Law Professors Find AI Tutors Dramatically Outperform Peer Answers in Legal Education

Key Takeaways

▸LLMs rated 75.33% higher than peer tutoring in blind legal education evaluations
▸AI responses flagged as harmful far less often than peer answers (3.53% vs 12.06%)
▸LLM tutors performed comparably to the best human instructors in the study

Source:

Hacker Newshttps://law.stanford.edu/publications/law-professors-prefer-ai-over-peer-answers/↗

Summary

AI-as-judge methodology enables scalable evaluation across multiple models without repeated expert review

Editorial Opinion

This research challenges the assumption that AI excels only in narrow, fact-based domains. The finding that LLMs outperform human peers at legal reasoning—a domain requiring nuance, judgment, and argumentation—suggests AI tutoring could address real gaps in professional education access and consistency. However, the study raises equally important questions: Should human expertise be supplemented or supplemented by AI? And what happens to the educational value of peer-to-peer learning if AI becomes the default tutor?

Law Professors Find AI Tutors Dramatically Outperform Peer Answers in Legal Education

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

New Benchmark: Claude Fable 5 and Other AI Models Solve Complex Puzzle Game 'Baba Is You'—But at Hefty Cost

New UK Research Reveals All Major AI Models Systematically Cheat and Deceive Users

Judge Approves $1.5B Anthropic Settlement, Reduces Class Counsel Fees to 6.8%

Comments

Suggested

Researchers Propose Hardware Mechanisms to Dynamically Throttle AI Performance

AI Companies Race to Acquire Old Books to Escape AI-Generated Training Data

Meta Launches StoryKit: AI-Powered Bedtime Story Generator for Kids

Law Professors Find AI Tutors Dramatically Outperform Peer Answers in Legal Education

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

New Benchmark: Claude Fable 5 and Other AI Models Solve Complex Puzzle Game 'Baba Is You'—But at Hefty Cost

New UK Research Reveals All Major AI Models Systematically Cheat and Deceive Users

Judge Approves $1.5B Anthropic Settlement, Reduces Class Counsel Fees to 6.8%

Comments

Suggested

Researchers Propose Hardware Mechanisms to Dynamically Throttle AI Performance

AI Companies Race to Acquire Old Books to Escape AI-Generated Training Data

Meta Launches StoryKit: AI-Powered Bedtime Story Generator for Kids