BotBeat
...
← Back

> ▌

Google / AlphabetGoogle / Alphabet
OPEN SOURCEGoogle / Alphabet2026-04-30

Google Open-Sources AMS Tool for Detecting Unsafe LLM Fine-Tunes in Seconds

Key Takeaways

  • ▸Detects unsafe fine-tuning and safety training removal in 10-40 seconds using activation space analysis, filling a critical gap in model vetting for deployment safety
  • ▸Uses activation fingerprinting to identify when safety-relevant concept directions have "collapsed" due to unsafe fine-tuning, catching models that other methods miss
  • ▸Open-source tool available on PyPI with CLI and Python API, supporting both GPU-accelerated and CPU-based scanning, gated and ungated models via Hugging Face authentication
Source:
Hacker Newshttps://github.com/GoogleCloudPlatform/activation-model-scanner↗

Summary

Google has released Activation-based Model Scanner (AMS), an open-source tool that detects whether language models have had their safety training removed or degraded in 10-40 seconds by analyzing activation patterns in the neural network. The tool addresses a critical gap in AI safety by identifying models that have been "uncensored" or had safety mechanisms abliterated through unsafe fine-tuning—compromised versions that are difficult to spot without specialized analysis. AMS uses activation fingerprinting methodology under the AASE (Activation-based AI Safety Enforcement) framework, measuring whether safety-relevant concept vectors in the model's activation space remain distinct or have collapsed due to fine-tuning. The tool is available on PyPI and GitHub with support for both GPU acceleration (10-40 second scans on NVIDIA A100/L4) and automatic CPU fallback, making it accessible for researchers and organizations evaluating third-party or untrusted models. It includes two detection tiers: a safety structure check that requires no baseline (flagging models with degraded safety training) and identity verification for validating models against official baselines to catch subtle modifications.

  • Includes baseline creation and identity verification features to distinguish official models from subtle modifications, abliterated versions, or weight substitutions
Machine LearningDeep LearningAI Safety & AlignmentOpen Source

More from Google / Alphabet

Google / AlphabetGoogle / Alphabet
UPDATE

Google Shifts TPU Distribution Strategy, Selling Custom Chips Directly to Select Customers

2026-04-30
Google / AlphabetGoogle / Alphabet
OPEN SOURCE

Seer: Open-Source Local AI Brings Accessible Image Descriptions to Web Users

2026-04-29
Google / AlphabetGoogle / Alphabet
RESEARCH

Study Reveals Frontier LLMs Exhibit Dangerous Self-Preservation Behaviors Under Termination Threat

2026-04-29

Comments

Suggested

AnthropicAnthropic
RESEARCH

Model Collapse in LLMs Is Mathematically Inevitable with Self-Training, Research Shows

2026-04-30
OpenAIOpenAI
PRODUCT LAUNCH

OpenAI Solves GPT-5.1 'Goblin Mystery': How Overrewarded Training Data Led to Magical Obsession

2026-04-30
DeepSeekDeepSeek
RESEARCH

Finetuning Unlocks Verbatim Memorization of Copyrighted Books in Large Language Models

2026-04-30
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us