Ontario Audit Finds Government-Approved AI Medical Scribes Plagued by Hallucinations and Inaccuracies
Key Takeaways
- ▸All 20 government-approved AI scribe vendors failed accuracy tests; 9 hallucinated medical information, 12 recorded incorrect details, 17 missed mental health issues
- ▸Accuracy scored only 12/20 on average but represented just 4% of vendor evaluation scores, while domestic presence weighted 30%—enabling approval despite critical safety failures
- ▸AI-generated hallucinations included fabricated referrals for blood tests and therapy, incorrect medication names, and omitted mental health discussion details—all with potential to harm patient care
Summary
Ontario's auditor general released a damning report on AI medical scribes approved by the provincial government, finding that all 20 pre-qualified vendors demonstrated serious accuracy and completeness problems in testing. The audit included transcription tests of two simulated patient-doctor conversations, revealing that 9 vendors hallucinated patient information (such as non-existent blood test referrals), 12 recorded information incorrectly including medication names, and 17 missed key mental health details—flaws that could directly compromise patient care.
The audit identified a critical structural flaw in the vendor evaluation process: while AI scribes averaged only 12 out of 20 on accuracy metrics, this crucial safety measurement accounted for just 4% of a vendor's overall approval score. By contrast, "domestic presence in Ontario" was weighted at 30%, making it possible for vendors to earn approval even with zero accuracy scores. The auditor general concluded that these systems "were not evaluated adequately" and emphasized the importance of rigorous testing to minimize inaccuracies that could "potentially result in inadequate or harmful treatment plans that may potentially impact patient health outcomes."
The report recommends that healthcare IT departments require physicians to review and confirm AI-generated notes before entering them into patient records. While Ontario's public sector health services are not required to use these approved vendors, the findings should raise serious concerns among any healthcare providers—public or private—currently relying on similar AI scribe systems.
- Ontario's auditor general found vendors were inadequately evaluated; recommends physician review and confirmation of AI notes before patient record entry
- Audit exposes systemic flaw in how government evaluates AI systems for healthcare: safety and accuracy deprioritized in favor of business considerations
Editorial Opinion
This audit reveals a fundamental crisis in AI safety evaluation for high-stakes healthcare applications. The fact that 20 vendors—every single one pre-qualified by government—all failed basic accuracy tests should alarm policymakers and healthcare leaders everywhere. Even more troubling is the structural misalignment of incentives: safety metrics were marginalized in scoring while domestic business presence was heavily weighted. For AI systems that directly impact patient health, this is backwards. Until government and healthcare systems rigorously validate AI medical tools against patient safety standards—not vendor convenience—these systems pose an unacceptable risk to care quality.



