Totem: New Open-Source Proxy Detects LLM Tampering and Safety Alignment Attacks

Key Takeaways

▸Totem addresses a previously unmonitored security gap by detecting post-deployment LLM tampering, guardrail bypasses, and safety alignment removal
▸The tool uses behavioral hashing, salted probes, and cryptographic manifest verification to maintain model integrity across three isolated security domains
▸Designed specifically for high-stakes applications like healthcare, finance, and legal systems where model reliability directly impacts critical decisions

Source:

Hacker Newshttps://github.com/open-edge-lab/totem-pub↗

Summary

Open Edge has released Totem, an agnostic proxy tool designed to detect whether deployed large language models (LLMs) have been compromised or tampered with after certification. Totem operates as a runtime behavioral integrity verification system that sits between client applications and LLM backends, continuously monitoring whether the model still behaves as originally certified by its publisher.

The tool specifically targets attacks that existing LLM security solutions overlook, including guardrail bypassing, model substitution, and removal of safety alignment. Totem employs three core security mechanisms: behavioral hashing (analyzing refusal patterns and logit distributions), salted probes (steganographic triggers that prevent attacker whitelisting), and cryptographically signed model manifests using Ed25519 keys. The system divides security responsibilities across three independent domains with no shared memory to minimize attack surface.

Totem is designed for high-stakes decision-support applications in healthcare, finance, legal, human resources, and industrial business intelligence—domains where model integrity is critical. The tool is available as open-source software with Docker support and includes a complete reproducible experiment demonstrating its ability to detect activation steering attacks on language foundation models.

Open-source implementation with Docker support and reproducible experiments makes it immediately deployable for enterprise LLM systems

Editorial Opinion

Totem addresses a critical but underappreciated vulnerability in deployed LLM systems—the post-certification drift or active tampering of models in production. By positioning integrity verification as a transparent proxy layer rather than requiring model re-architecture, the approach is pragmatic and immediately deployable. However, the reliance on behavioral probes rather than cryptographic model verification highlights a fundamental asymmetry: attackers with backend access can always eventually evade behavioral detection through sophisticated adaptive attacks. This tool is a necessary defense in depth, but should be paired with stronger access controls and cryptographic model verification.

Totem: New Open-Source Proxy Detects LLM Tampering and Safety Alignment Attacks

Key Takeaways

▸Totem addresses a previously unmonitored security gap by detecting post-deployment LLM tampering, guardrail bypasses, and safety alignment removal
▸The tool uses behavioral hashing, salted probes, and cryptographic manifest verification to maintain model integrity across three isolated security domains
▸Designed specifically for high-stakes applications like healthcare, finance, and legal systems where model reliability directly impacts critical decisions

Summary

Open-source implementation with Docker support and reproducible experiments makes it immediately deployable for enterprise LLM systems

Editorial Opinion

Totem addresses a critical but underappreciated vulnerability in deployed LLM systems—the post-certification drift or active tampering of models in production. By positioning integrity verification as a transparent proxy layer rather than requiring model re-architecture, the approach is pragmatic and immediately deployable. However, the reliance on behavioral probes rather than cryptographic model verification highlights a fundamental asymmetry: attackers with backend access can always eventually evade behavioral detection through sophisticated adaptive attacks. This tool is a necessary defense in depth, but should be paired with stronger access controls and cryptographic model verification.

Totem: New Open-Source Proxy Detects LLM Tampering and Safety Alignment Attacks

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says

Totem: New Open-Source Proxy Detects LLM Tampering and Safety Alignment Attacks

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says