Arcee AI Releases Trinity-Large-Thinking: 398B Open-Source MoE Model Purpose-Built for AI Agents
Key Takeaways
- ▸Trinity's core innovation is maintaining thinking tokens across entire agent loops, preserving the model's reasoning process and decision rationale throughout multi-step tasks rather than losing context between tool calls
- ▸The 398B/13B active MoE architecture delivers near-13B inference speed while accessing knowledge from 256 specialized experts distributed across the network, creating a novel performance-to-capability ratio
- ▸Trinity significantly outperforms Claude Opus 4.6 on specialized agentic task benchmarks (88.0 vs 82.0 on Tau2-Airline), though it trails on general reasoning tasks, reflecting its purpose-built design for agent applications
Summary
Arcee AI has released Trinity-Large-Thinking, a 398 billion parameter open-source mixture-of-experts (MoE) model with only 13 billion active parameters during inference. Unlike most models that claim agentic capability, Trinity was specifically trained on multi-step agentic tasks, tool-calling trajectories, and reasoning chains, with a key architectural innovation: it preserves thinking tokens across entire agent loops, allowing the model to maintain context about why previous decisions were made rather than starting fresh at each step.
The model features a 512k token context window and was pretrained on 17 trillion tokens before specialized post-training on agent-specific tasks. While Trinity does not outperform Claude Opus 4.6 on general reasoning benchmarks like GPQA-Diamond and MMLU-Pro, it demonstrates superior performance on agentic task completion benchmarks—scoring 88.0 on Tau2-Airline (versus Opus 4.6's 82.0) and 94.7 on Tau2-Telecom, demonstrating its specialized design for real-world multi-step agent scenarios.
The MoE architecture allows Trinity to run efficiently despite its massive parameter count, operating at speeds comparable to a 13B model while accessing knowledge distributed across 256 experts. However, the model is resource-intensive and designed for enterprise deployments rather than consumer GPUs, with explicit documentation requirements about preserving reasoning blocks in message history for optimal performance.
- The model requires preserving reasoning blocks in context history—stripping them breaks the model—indicating deep architectural integration of reasoning with tool-use rather than bolted-on capabilities
Editorial Opinion
Trinity-Large-Thinking represents a meaningful shift in how open-source models approach agentic AI, moving beyond instruction-tuned models with added tool-calling toward systems deliberately architected for multi-step reasoning and decision-making. The preservation of thinking tokens across agent loops is particularly significant—it's a design choice that directly addresses a critical failure mode in current agent deployments. If the benchmark performance holds up in real-world deployments, this could become a reference architecture for open-source agent models.


