BotBeat
...
← Back

> ▌

Google / AlphabetGoogle / Alphabet
OPEN SOURCEGoogle / Alphabet2026-04-30

Google Open-Sources AMS Tool for Detecting Unsafe LLM Fine-Tunes in Seconds

Key Takeaways

  • ▸Detects unsafe fine-tuning and safety training removal in 10-40 seconds using activation space analysis, filling a critical gap in model vetting for deployment safety
  • ▸Uses activation fingerprinting to identify when safety-relevant concept directions have "collapsed" due to unsafe fine-tuning, catching models that other methods miss
  • ▸Open-source tool available on PyPI with CLI and Python API, supporting both GPU-accelerated and CPU-based scanning, gated and ungated models via Hugging Face authentication
Source:
Hacker Newshttps://github.com/GoogleCloudPlatform/activation-model-scanner↗

Summary

Google has released Activation-based Model Scanner (AMS), an open-source tool that detects whether language models have had their safety training removed or degraded in 10-40 seconds by analyzing activation patterns in the neural network. The tool addresses a critical gap in AI safety by identifying models that have been "uncensored" or had safety mechanisms abliterated through unsafe fine-tuning—compromised versions that are difficult to spot without specialized analysis. AMS uses activation fingerprinting methodology under the AASE (Activation-based AI Safety Enforcement) framework, measuring whether safety-relevant concept vectors in the model's activation space remain distinct or have collapsed due to fine-tuning. The tool is available on PyPI and GitHub with support for both GPU acceleration (10-40 second scans on NVIDIA A100/L4) and automatic CPU fallback, making it accessible for researchers and organizations evaluating third-party or untrusted models. It includes two detection tiers: a safety structure check that requires no baseline (flagging models with degraded safety training) and identity verification for validating models against official baselines to catch subtle modifications.

  • Includes baseline creation and identity verification features to distinguish official models from subtle modifications, abliterated versions, or weight substitutions
Machine LearningDeep LearningAI Safety & AlignmentOpen Source

More from Google / Alphabet

Google / AlphabetGoogle / Alphabet
POLICY & REGULATION

Google Sues Chinese Cybercrime Ring Using AI-Powered Phishing Kits

2026-06-14
Google / AlphabetGoogle / Alphabet
POLICY & REGULATION

Google Files Lawsuit Against AI-Powered Scam Network, Pushes Federal Legislation

2026-06-14
Google / AlphabetGoogle / Alphabet
RESEARCH

Google's Gemini-SQL2 Dominates Text-to-SQL Benchmarks with Record 80% Accuracy

2026-06-13

Comments

Suggested

OpenAIOpenAI
FUNDING & BUSINESS

New Brunswick Woman Sues OpenAI, Alleging ChatGPT Led to Daughter's Death

2026-06-14
OpenAIOpenAI
POLICY & REGULATION

OpenAI Hit with Multistate Probe Into Possible User Harm as IPO Looms

2026-06-14
AnthropicAnthropic
POLICY & REGULATION

Anthropic Releases Economic Policy Framework for AI-Driven Labor Disruption, Commits $350M

2026-06-14
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us