Ontario Auditors Find AI Note-Taking Systems Routinely Fail Basic Accuracy Tests
Key Takeaways
- ▸60% of evaluated AI Scribe systems fabricated or mishandled critical medical information like drug names and diagnoses
- ▸85% of systems missed key mental health details discussed in patient-doctor conversations
- ▸Evaluation scoring was heavily weighted toward vendor location (30%) over accuracy (4%) and security concerns (8% combined)
Summary
A provincial audit by Ontario's Office of the Auditor General has revealed critical failures in AI Scribe systems approved for healthcare use across the province. Of 20 evaluated vendors' AI note-taking systems, 60% mixed up prescribed drugs in patient notes, 45% fabricated information not discussed in patient recordings, and 85% missed key mental health details. The audit tested systems using simulated doctor-patient recordings, with medical professionals reviewing the AI-generated notes for accuracy.
The findings expose major concerns about AI reliability in clinical settings. Nine systems invented details like diagnoses of anxiety or absence of masses that were never mentioned, while 12 systems inserted incorrect drug information. Despite these failures, the evaluation process itself was flawed: accuracy accounted for only 4% of vendors' evaluation scores, while domestic presence in Ontario counted for 30%. Bias controls, threat assessment, and privacy safeguards combined for just 8% of the scoring.
The report suggests that misweighted evaluation criteria led to approval of AI tools that may produce inaccurate or biased medical records. OntarioMD has recommended manual review of AI-generated notes, but no AI Scribe-approved system includes mandatory verification features for clinicians, leaving the burden of fact-checking on already busy healthcare providers.
- No mandatory attestation or verification features exist in approved AI Scribe systems to catch errors before they enter patient records
Editorial Opinion
This audit exposes a troubling gap between AI hype in healthcare and clinical reality. When AI systems that hallucinate diagnoses and drug names are approved for healthcare providers with minimal accuracy requirements in the vendor selection process, it suggests regulatory frameworks have not kept pace with deployment. The responsibility for catching AI errors should not rest entirely on overworked clinicians—robust accuracy verification must be built into these systems before approval, not left to manual review afterward.



