HarEmb: Efficient PII Detection with Single-Layer Transformer Achieves Production Readiness
Key Takeaways
- ▸Single-layer distillation reduces model size from 1.4B to 287M parameters while maintaining state-of-the-art performance on 55 PII categories
- ▸Outperforms deeper teacher model on fuzzy PII categories (gender, political affiliation, language), suggesting single-layer efficiency is sufficient for classification
- ▸Built-in constrained Viterbi decoding ensures span coherence without requiring post-processing or additional validation
Summary
A new single-layer model called HarEmb demonstrates that production-grade PII detection doesn't require deep transformer architectures. Built on OpenAI's privacy-filter model and fine-tuned on NVIDIA's Nemotron-PII dataset, HarEmb reduces model size from 1.4B parameters to just 287M while achieving state-of-the-art performance on token-level PII classification across 55 fine-grained categories including identity, contact, address, financial, and healthcare identifiers.
The model shows comparable or superior performance to its larger teacher model on many tasks, notably outperforming it on fuzzy categorization tasks like gender (0.987 vs 0.841 F1), political affiliation (0.872 vs 0.839), and language detection. This pattern suggests that a single-layer architecture provides effective inductive bias for certain PII detection challenges, contrary to conventional wisdom about transformer depth.
With constrained BIOES Viterbi decoding built in for coherent span predictions and significant reductions in both memory and compute requirements, HarEmb is optimized for real-time deployment. The model is available as open-source through Hugging Face and integrates directly with OpenMed's privacy detection framework, making it immediately usable for developers building privacy-preserving applications.
- Significantly reduced memory and compute requirements enable real-time PII detection for large-scale production deployments



