BotBeat
...
← Back

> ▌

Open EdgeOpen Edge
OPEN SOURCEOpen Edge2026-03-31

Totem: New Open-Source Proxy Detects LLM Tampering and Safety Alignment Attacks

Key Takeaways

  • ▸Totem addresses a previously unmonitored security gap by detecting post-deployment LLM tampering, guardrail bypasses, and safety alignment removal
  • ▸The tool uses behavioral hashing, salted probes, and cryptographic manifest verification to maintain model integrity across three isolated security domains
  • ▸Designed specifically for high-stakes applications like healthcare, finance, and legal systems where model reliability directly impacts critical decisions
Source:
Hacker Newshttps://github.com/open-edge-lab/totem-pub↗

Summary

Open Edge has released Totem, an agnostic proxy tool designed to detect whether deployed large language models (LLMs) have been compromised or tampered with after certification. Totem operates as a runtime behavioral integrity verification system that sits between client applications and LLM backends, continuously monitoring whether the model still behaves as originally certified by its publisher.

The tool specifically targets attacks that existing LLM security solutions overlook, including guardrail bypassing, model substitution, and removal of safety alignment. Totem employs three core security mechanisms: behavioral hashing (analyzing refusal patterns and logit distributions), salted probes (steganographic triggers that prevent attacker whitelisting), and cryptographically signed model manifests using Ed25519 keys. The system divides security responsibilities across three independent domains with no shared memory to minimize attack surface.

Totem is designed for high-stakes decision-support applications in healthcare, finance, legal, human resources, and industrial business intelligence—domains where model integrity is critical. The tool is available as open-source software with Docker support and includes a complete reproducible experiment demonstrating its ability to detect activation steering attacks on language foundation models.

  • Open-source implementation with Docker support and reproducible experiments makes it immediately deployable for enterprise LLM systems

Editorial Opinion

Totem addresses a critical but underappreciated vulnerability in deployed LLM systems—the post-certification drift or active tampering of models in production. By positioning integrity verification as a transparent proxy layer rather than requiring model re-architecture, the approach is pragmatic and immediately deployable. However, the reliance on behavioral probes rather than cryptographic model verification highlights a fundamental asymmetry: attackers with backend access can always eventually evade behavioral detection through sophisticated adaptive attacks. This tool is a necessary defense in depth, but should be paired with stronger access controls and cryptographic model verification.

Large Language Models (LLMs)MLOps & InfrastructureCybersecurityAI Safety & Alignment

Comments

Suggested

OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us