Google Open-Sources AMS Tool for Detecting Unsafe LLM Fine-Tunes in Seconds

Key Takeaways

▸Detects unsafe fine-tuning and safety training removal in 10-40 seconds using activation space analysis, filling a critical gap in model vetting for deployment safety
▸Uses activation fingerprinting to identify when safety-relevant concept directions have "collapsed" due to unsafe fine-tuning, catching models that other methods miss
▸Open-source tool available on PyPI with CLI and Python API, supporting both GPU-accelerated and CPU-based scanning, gated and ungated models via Hugging Face authentication

Source:

Hacker Newshttps://github.com/GoogleCloudPlatform/activation-model-scanner↗

Summary

Google has released Activation-based Model Scanner (AMS), an open-source tool that detects whether language models have had their safety training removed or degraded in 10-40 seconds by analyzing activation patterns in the neural network. The tool addresses a critical gap in AI safety by identifying models that have been "uncensored" or had safety mechanisms abliterated through unsafe fine-tuning—compromised versions that are difficult to spot without specialized analysis. AMS uses activation fingerprinting methodology under the AASE (Activation-based AI Safety Enforcement) framework, measuring whether safety-relevant concept vectors in the model's activation space remain distinct or have collapsed due to fine-tuning. The tool is available on PyPI and GitHub with support for both GPU acceleration (10-40 second scans on NVIDIA A100/L4) and automatic CPU fallback, making it accessible for researchers and organizations evaluating third-party or untrusted models. It includes two detection tiers: a safety structure check that requires no baseline (flagging models with degraded safety training) and identity verification for validating models against official baselines to catch subtle modifications.

Includes baseline creation and identity verification features to distinguish official models from subtle modifications, abliterated versions, or weight substitutions

Google / Alphabet

OPEN SOURCE Google / Alphabet2026-04-30

Google Open-Sources AMS Tool for Detecting Unsafe LLM Fine-Tunes in Seconds

Key Takeaways

▸Detects unsafe fine-tuning and safety training removal in 10-40 seconds using activation space analysis, filling a critical gap in model vetting for deployment safety
▸Uses activation fingerprinting to identify when safety-relevant concept directions have "collapsed" due to unsafe fine-tuning, catching models that other methods miss
▸Open-source tool available on PyPI with CLI and Python API, supporting both GPU-accelerated and CPU-based scanning, gated and ungated models via Hugging Face authentication

Source:

Hacker Newshttps://github.com/GoogleCloudPlatform/activation-model-scanner↗

Summary

Includes baseline creation and identity verification features to distinguish official models from subtle modifications, abliterated versions, or weight substitutions

Google Open-Sources AMS Tool for Detecting Unsafe LLM Fine-Tunes in Seconds

Key Takeaways

Summary

More from Google / Alphabet

Google Shifts TPU Distribution Strategy, Selling Custom Chips Directly to Select Customers

Seer: Open-Source Local AI Brings Accessible Image Descriptions to Web Users

Study Reveals Frontier LLMs Exhibit Dangerous Self-Preservation Behaviors Under Termination Threat

Comments

Suggested

Model Collapse in LLMs Is Mathematically Inevitable with Self-Training, Research Shows

OpenAI Solves GPT-5.1 'Goblin Mystery': How Overrewarded Training Data Led to Magical Obsession

Finetuning Unlocks Verbatim Memorization of Copyrighted Books in Large Language Models

Google Open-Sources AMS Tool for Detecting Unsafe LLM Fine-Tunes in Seconds

Key Takeaways

Summary

More from Google / Alphabet

Google Shifts TPU Distribution Strategy, Selling Custom Chips Directly to Select Customers

Seer: Open-Source Local AI Brings Accessible Image Descriptions to Web Users

Study Reveals Frontier LLMs Exhibit Dangerous Self-Preservation Behaviors Under Termination Threat

Comments

Suggested

Model Collapse in LLMs Is Mathematically Inevitable with Self-Training, Research Shows

OpenAI Solves GPT-5.1 'Goblin Mystery': How Overrewarded Training Data Led to Magical Obsession

Finetuning Unlocks Verbatim Memorization of Copyrighted Books in Large Language Models