Independent Researcher Builds 85% Accurate AI Text Detector Using Classical Machine Learning
Key Takeaways
- ▸Classical machine learning (SVM) can achieve ~85% accuracy detecting LLM-generated text without requiring large language models or complex architectures
- ▸LLM-generated content exhibits strong statistical patterns in word choice and sentence structure that traditional classifiers can identify
- ▸The detector is open-source and deployable as a lightweight web application, contrasting with expensive commercial AI detection services
Summary
An independent researcher has developed an open-source AI text detector that achieves approximately 85% accuracy in identifying LLM-generated content using classical machine learning techniques rather than complex neural networks. The detector, built with scikit-learn's Support Vector Machine (SVM), leverages the observation that as of early 2026, mainstream LLM-generated texts exhibit strong statistical patterns distinguishable from human writing.
The researcher, motivated by encountering AI-generated fan fiction on the Chinese platform Lofter, rejected complex perplexity-based approaches in favor of simpler classical ML methods. The model was trained on pre-ChatGPT web novel data (2010-2022) as confirmed human-written content, paired with LLM-generated samples. The approach contrasts with commercial "AI plagiarism checkers" which may use similar statistical methods but remain proprietary black boxes.
The project, available as both a web demo and open-source code on GitHub (lyc8503/AITextDetector), includes experiments on evasion techniques and acknowledges limitations including domain specificity and the model's focus on web novels rather than general-purpose text. The researcher notes that traditional translation and prompt engineering can sometimes fool the detector, highlighting the ongoing cat-and-mouse game between AI generation and detection technologies.
- Detection can be evaded through translation loops or careful prompt engineering, indicating an ongoing arms race between generation and detection
- The approach validates that many commercial AI plagiarism checkers likely use similar statistical methods rather than sophisticated deep learning
Editorial Opinion
This project exemplifies how accessible AI detection can be when approached pragmatically rather than through hype-driven complexity. The researcher's willingness to reject fashionable perplexity-based methods in favor of simple SVMs demonstrates that classical ML still has tremendous value in the age of transformers. However, the 85% accuracy and documented evasion techniques underscore a critical reality: AI detection remains a probabilistic game, not a definitive solution, with significant implications for academic integrity enforcement and content moderation policies.



