PrivAiTe: Open-Source Self-Hosted LLM Proxy Redacts PII Before Reaching Model Providers
Key Takeaways
- ▸Self-hosted proxy redacts PII from all LLM call layers: message text, tool-call arguments, and multimodal content
- ▸Dual-engine detection (Presidio + OpenAI Privacy Filter) balances recall and false-positive rates; configurable presets include 'light' (Presidio-only, ~10s ms latency) and 'onnx' (full suite, default)
- ▸Zero telemetry; all detection runs locally; compatible with OpenAI-compatible clients; users remain data controllers
Summary
PrivAiTe, a new open-source tool, provides a self-hosted proxy that reversibly redacts personally identifiable information (PII) from LLM API calls before they reach model providers, including sensitive data within tool-call arguments and multimodal content. The proxy performs all PII detection locally with zero telemetry and routes requests through any OpenAI-compatible client endpoint, allowing users to retain control as their own data controller.
The tool employs a dual-engine detection strategy to maximize PII coverage while minimizing false positives. The default configuration runs both Presidio (Microsoft's regex-based and spaCy NER engine) and OpenAI's open-source Privacy Filter (a 1.5B-parameter ONNX model). Presidio excels at detecting structured PII through pattern validation but may miss contextual names; the Privacy Filter handles context-dependent identifiers and secrets but can occasionally over-flag technical terms. Together, the engines cover each other's blind spots. Users can opt for faster configurations: the 'light' preset runs Presidio only, while the experimental 'max' preset adds GLiNER for improved out-of-distribution recall.
All detection runs on the user's machine with no external calls, and the project explicitly defines its threat model to clarify what protection is and is not offered. The proxy handles PII reversal transparently across all response layers, returning original values to the end user while keeping model providers blind to sensitive information.
- Transparent PII reversal exposes original data only to the end user, not the LLM provider



