PrivAiTe: Open-Source Self-Hosted LLM Proxy Redacts PII Before Reaching Model Providers

Key Takeaways

▸Self-hosted proxy redacts PII from all LLM call layers: message text, tool-call arguments, and multimodal content
▸Dual-engine detection (Presidio + OpenAI Privacy Filter) balances recall and false-positive rates; configurable presets include 'light' (Presidio-only, ~10s ms latency) and 'onnx' (full suite, default)
▸Zero telemetry; all detection runs locally; compatible with OpenAI-compatible clients; users remain data controllers

Source:

Hacker Newshttps://github.com/crp4222/PrivAiTe↗

Summary

PrivAiTe, a new open-source tool, provides a self-hosted proxy that reversibly redacts personally identifiable information (PII) from LLM API calls before they reach model providers, including sensitive data within tool-call arguments and multimodal content. The proxy performs all PII detection locally with zero telemetry and routes requests through any OpenAI-compatible client endpoint, allowing users to retain control as their own data controller.

The tool employs a dual-engine detection strategy to maximize PII coverage while minimizing false positives. The default configuration runs both Presidio (Microsoft's regex-based and spaCy NER engine) and OpenAI's open-source Privacy Filter (a 1.5B-parameter ONNX model). Presidio excels at detecting structured PII through pattern validation but may miss contextual names; the Privacy Filter handles context-dependent identifiers and secrets but can occasionally over-flag technical terms. Together, the engines cover each other's blind spots. Users can opt for faster configurations: the 'light' preset runs Presidio only, while the experimental 'max' preset adds GLiNER for improved out-of-distribution recall.

All detection runs on the user's machine with no external calls, and the project explicitly defines its threat model to clarify what protection is and is not offered. The proxy handles PII reversal transparently across all response layers, returning original values to the end user while keeping model providers blind to sensitive information.

Transparent PII reversal exposes original data only to the end user, not the LLM provider

PrivAiTe: Open-Source Self-Hosted LLM Proxy Redacts PII Before Reaching Model Providers

Key Takeaways

▸Self-hosted proxy redacts PII from all LLM call layers: message text, tool-call arguments, and multimodal content
▸Dual-engine detection (Presidio + OpenAI Privacy Filter) balances recall and false-positive rates; configurable presets include 'light' (Presidio-only, ~10s ms latency) and 'onnx' (full suite, default)
▸Zero telemetry; all detection runs locally; compatible with OpenAI-compatible clients; users remain data controllers

Summary

Transparent PII reversal exposes original data only to the end user, not the LLM provider

PrivAiTe: Open-Source Self-Hosted LLM Proxy Redacts PII Before Reaching Model Providers

Key Takeaways

Summary

Comments

Suggested

Researchers Develop Real-Time Hallucination Detection for Edge-Deployed Language Models

Independent Analysis Reveals True Token Costs and Usage Limits Behind Leading Coding Agent Plans

CorvinOS Launches Self-Hosted Agentic OS with EU AI Act 2026 Compliance Built Into Architecture

PrivAiTe: Open-Source Self-Hosted LLM Proxy Redacts PII Before Reaching Model Providers

Key Takeaways

Summary

Comments

Suggested

Researchers Develop Real-Time Hallucination Detection for Edge-Deployed Language Models

Independent Analysis Reveals True Token Costs and Usage Limits Behind Leading Coding Agent Plans

CorvinOS Launches Self-Hosted Agentic OS with EU AI Act 2026 Compliance Built Into Architecture