College of Experts Framework Demonstrates Hardware-Agnostic Approach to Splitting 80B MoE Model into 40B Domain Specialists
Key Takeaways
- ▸An 80B MoE model can be effectively decomposed into 40B domain-specialist models using a Supervisor routing system, demonstrating separability of machine intelligence across domains
- ▸The hardware-agnostic architecture runs efficiently on consumer devices without CUDA dependency hell, with ONNX Runtime handling Supervisor inference independent of specialist VRAM usage
- ▸Customizable output templates and skills libraries enable fine-grained control over specialist behavior while reducing computational overhead through specialized routing rather than monolithic inference
Summary
The College of Experts AI framework has released a demonstration (v1.5) that showcases how an 80 billion parameter Mixture-of-Experts (MoE) language model can be efficiently split into 40 billion parameter domain-specialist models. The system uses Ollama for hosting the specialist models alongside an ONNX-based Supervisor model that intelligently routes queries to the appropriate domain expert.
The framework is designed for accessibility and efficiency, running on consumer hardware including Windows Copilot+ PCs, AMD APUs, Mac M-series chips, and NVIDIA RTX GPUs without requiring complex CUDA dependencies. The Supervisor model operates natively in Python using ONNX Runtime, eliminating competition for VRAM with the specialist models and enabling blazing-fast inference across diverse hardware platforms.
The demonstration includes comprehensive customization capabilities, allowing users to define output templates that constrain specialist outputs to specific structural patterns and inject specialist skills that influence reasoning approaches. The framework employs a context-enrichment layer using semantic embeddings (BAAI/bge-m3) to match queries against known output patterns and reasoning guidance, creating a modular system where intelligence can be separated and specialized by domain.
- The open-source demonstration includes pre-built infrastructure for semantic query matching, embedding caching, and hardware-specific optimization providers (DirectML, CUDA, CPU fallback)
Editorial Opinion
The College of Experts framework represents an interesting architectural shift toward modularity and accessibility in large language models. By demonstrating that an 80B parameter model can be effectively split into specialized 40B experts with a lightweight routing layer, the project challenges the assumption that bigger-is-better for LLMs and suggests domain specialization could offer both efficiency and performance gains. The hardware-agnostic approach is particularly valuable—making sophisticated multi-expert systems runnable on consumer hardware without CUDA complexity could democratize access to advanced AI inference patterns.



