ICML Rejects 497 Papers for Illicit AI Use in Peer Reviews Using Watermark Detection
Key Takeaways
- ▸ICML deployed watermark-based detection to identify LLM-generated peer reviews, successfully catching 497 violations among submissions
- ▸The conference's reciprocal review policy—requiring authors to review other papers—made enforcement of AI-use restrictions possible through paper rejection
- ▸Over 50% of researchers now use AI tools for peer review despite explicit prohibitions, indicating a significant gap between policy and practice in academic publishing
Summary
The International Conference on Machine Learning (ICML), scheduled for July 2026 in Seoul, has rejected approximately 497 papers—roughly 2% of submissions—from authors who violated the conference's large language model (LLM)-use policies during peer review. The conference implemented a novel detection method using hidden watermarks embedded in research papers distributed for review; when an LLM was used to generate a peer review, the watermark triggered the model to include specific telltale phrases that revealed AI-assisted content generation. Conference organizers emphasized the decision in a statement, saying they hope to "remind the community that as our field changes rapidly the thing we must protect most actively is our trust in each other." The rejections highlight a growing challenge within academic research, as more than half of researchers now reportedly use AI for peer review, often in violation of conference policies and ethical guidelines.
- The incident underscores the research community's need for clearer guidance on responsible and ethical AI use in academic workflows
Editorial Opinion
ICML's aggressive enforcement action using watermark detection represents an important boundary-setting moment for the research community, but it also reveals a troubling pattern: the gap between stated policies and researcher behavior suggests that clarity and consensus on AI use in academic processes remain lacking. While the technical innovation of watermark-based detection is clever, the real lesson is that the field must move beyond punitive measures toward establishing shared norms and transparent guidelines on where and how AI tools can responsibly augment (rather than replace) human scholarly judgment.


