BotBeat
...
← Back

> ▌

Google / AlphabetGoogle / Alphabet
RESEARCHGoogle / Alphabet2026-04-23

Study Finds Half of AI Health Answers Are Wrong Despite Sounding Authoritative

Key Takeaways

  • ▸Half of health-related answers from ChatGPT, Gemini, Grok, Meta AI, and DeepSeek are problematic, yet presented with confident, authoritative formatting that misleads readers
  • ▸Reference lists provided by AI chatbots are unreliable, with fabricated citations and broken links appearing across all models—creating false credibility for harmful information
  • ▸Performance varies significantly by topic and question type; open-ended health questions (the most common type) trigger highly problematic responses 32% of the time compared to 7% for closed questions
Source:
Hacker Newshttps://theconversation.com/half-of-ai-health-answers-are-wrong-even-though-they-sound-convincing-new-study-280512↗

Summary

A new study published in BMJ Open reveals that major AI chatbots—including ChatGPT, Gemini, Grok, Meta AI, and DeepSeek—provide problematic health information roughly half the time, despite presenting answers in a convincing, doctor-like format. Researchers systematically tested five popular chatbots with 50 health questions spanning cancer, vaccines, stem cells, nutrition, and athletic performance, finding that nearly 20% of answers were highly problematic, 50% were problematic overall, and 30% were somewhat problematic.

The study exposed critical reliability issues, particularly with references: no chatbot managed a single fully accurate reference list across 25 attempts, with errors ranging from wrong authors and fabricated papers to broken links. Performance varied significantly by topic, with chatbots handling vaccines and cancer reasonably well (still producing problematic answers 25% of the time) but struggling most with nutrition and athletic performance—domains characterized by conflicting information and thinner evidence bases. Open-ended questions proved most problematic, with 32% rated as highly problematic compared to just 7% for closed questions, a distinction that matters because most real-world health queries are open-ended.

  • Language models predict statistically likely text rather than reasoning about medical evidence, making them inherently unreliable for health information despite training on peer-reviewed research

Editorial Opinion

This study underscores a critical gap between AI capability and safety in high-stakes domains. While the researchers note that stress-testing conditions may overstate real-world error rates, the fact that chatbots routinely fabricate citations and confidently dispense misleading health advice is deeply concerning. The distinction between open-ended and closed questions is particularly alarming because patients naturally ask open-ended questions—exactly the scenario where AI chatbots fail most catastrophically. Until these models can reliably ground medical claims in evidence and acknowledge uncertainty, deploying them as health information sources risks real patient harm.

Large Language Models (LLMs)Natural Language Processing (NLP)HealthcareEthics & BiasAI Safety & Alignment

More from Google / Alphabet

Google / AlphabetGoogle / Alphabet
POLICY & REGULATION

UK Regulators Order Google to Let Publishers Opt Out of AI Content Scraping

2026-06-05
Google / AlphabetGoogle / Alphabet
RESEARCH

Chrome Achieves Dual Record-Breaking Scores on Speedometer 3.1 and JetStream 3

2026-06-05
Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google Launches Project Suncatcher: Orbital AI Data Centers With Solar-Powered TPUs

2026-06-05

Comments

Suggested

AI Industry (Unknown)AI Industry (Unknown)
INDUSTRY REPORT

LLM Training Crawlers Overwhelm SourceHut, Disrupting Open-Source Infrastructure

2026-06-07
OpenAIOpenAI
INDUSTRY REPORT

Companies Are Using Reddit to Manipulate ChatGPT and Google AI Search

2026-06-07
AnthropicAnthropic
RESEARCH

Research Reveals AI Agents Cost 1000x More Than Expected—and Model Efficiency Varies Dramatically

2026-06-07
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us