Civic-SLM: Open-Source AI Model Tailored for U.S. Local Government Documents
Key Takeaways
- ▸Civic-SLM fills a gap in civic transparency by specializing in local government documents, where general-purpose LLMs hallucinate and miss citations
- ▸Open source and auditable, trained on consumer hardware (Apple Silicon), making specialized models accessible without massive infrastructure costs
- ▸Rigorous baseline evaluation at every training stage ensures factuality—critical for government transparency and public accountability tools
Summary
Civic-SLM, a domain-specialized fine-tune of Alibaba's Qwen2.5-7B-Instruct, was released as an open-source project designed specifically to analyze U.S. local government documents—city and county agendas, staff reports, ordinances, comprehensive plans, and municipal codes. The model addresses a critical gap in civic transparency, where general-purpose LLMs hallucinate specifics, miss citations, and struggle with government document genres that Civic-SLM was trained to handle.
Released under MIT license, the model can run on any standard runtime: MLX, Ollama, LM Studio, llama.cpp, or OpenAI-compatible endpoints. A notable technical achievement is that the model was trained on a single Apple Silicon Mac using MLX-LM, proving that specialized domain fine-tunes don't require massive GPU farms. The project distributes both MLX-q4 and GGUF Q5_K_M quantizations.
The training pipeline—crawling local government websites via browser automation, validating document chunks with Pydantic schemas, synthesizing training pairs via Anthropic SDK or fully-local backends, and running multi-stage training (CPT, SFT, DPO)—is fully reproducible and open. Every training stage is evaluated against committed baselines to ensure factuality and appropriate refusal; the philosophy is simple: no training without a baseline.
- Designed to power civic transparency applications across all 50 U.S. states with pre-crawled data recipes for any U.S. jurisdiction



