4TB of Voice and Identity Data Stolen From 40,000 Mercor AI Contractors in Lapsus$ Breach
Key Takeaways
- ▸Lapsus$ published 4TB of Mercor contractor data on April 4, 2026, containing voice samples and government IDs from 40,000+ AI workers
- ▸Voice samples average 2-5 minutes of studio-quality audio—well above the ~15-second threshold required for high-fidelity synthetic voice cloning
- ▸The pairing of audio biometrics with verified identity documents enables impersonation attacks on banking systems, employers, video call fraud, insurance claims, and elder scams
Summary
On April 4, 2026, extortion group Lapsus$ published a 4-terabyte data dump from Mercor, an AI contractor platform for data labeling and voice sample collection, affecting more than 40,000 contributors. The breach combines two previously separate attack vectors: high-quality voice recordings (averaging 2-5 minutes of studio-clean audio per person) paired with government-issued identity documents including passports and driver's licenses. This pairing is unprecedented in risk scope: attackers now possess both the raw materials for synthetic voice cloning and verified identity credentials in a single dataset.
The voice samples exceed the ~15-second threshold required for high-quality voice cloning using commercially available tools, as reported by the Wall Street Journal in February 2026. Within ten days of the breach, five contractor lawsuits were filed against Mercor, alleging the company collected voice prints under a "training data" framing without disclosing that these recordings would function as permanent biometric identifiers. The plaintiffs argue this constituted inadequate informed consent for such sensitive personal data.
Security researchers have documented multiple weaponization pathways already in active use: voice biometric bypass of bank verification systems, vishing attacks impersonating employees to HR or finance departments, deepfake video calls (modeled on the 2024 Arup incident where a CFO was socially engineered into a $25 million wire transfer), insurance claim fraud (which saw a 475% year-over-year increase in synthetic voice attacks during 2025), and elder-targeted romance and emergency impersonation scams. The FBI Internet Crime Complaint Center recorded $2.3 billion in losses from synthetic emergency impersonation calls targeting seniors over age 60 in 2026 alone.
- Five lawsuits filed within ten days allege Mercor misrepresented voice collection as generic training data rather than permanent biometric enrollment
- Victims cannot rotate or invalidate stolen voice data; remediation requires proactive monitoring and account security changes across all services that use voice or identity verification



