MatX One Delivers Record-Breaking Throughput for Large Language Models
Key Takeaways
- ▸MatX One achieves the highest FLOPS/mm² of any announced AI accelerator product, setting new performance benchmarks for LLM workloads
- ▸Optimized memory hierarchy uses SRAM for weights (enabling low latency) and HBM for key-value data (supporting long-context inference)
- ▸Supports >2,000 tokens/second throughput for large 100-layer MoE models and scales to clusters with hundreds of thousands of chips
Summary
A new specialized AI accelerator chip called MatX One has been announced, delivering the highest throughput ever achieved for large language models while maintaining competitive latencies across multiple workload types. The chip optimizes the memory hierarchy for LLM workloads, storing weights in SRAM for low-latency access and key-value data in HBM to support extended context windows.
MatX One achieves the highest FLOPS per square millimeter of any announced product, enabling more than 2,000 output tokens per second for large 100-layer mixture-of-experts models. The architecture excels across the full LLM lifecycle—training, reinforcement learning, inference prefill, and inference decode—supporting both large dense models and mixture-of-experts architectures without architectural size limitations.
The chip is engineered for massive scale deployment, supporting clusters with hundreds of thousands of devices through advanced interconnect technology. The direct-control programming model enables developers to optimize performance for specific workloads. MatX has secured backing from prominent investors including Jane Street, Situational Awareness LP, Spark Capital, and investment funds led by Nat Friedman and Daniel Gross, signaling confidence in the company's approach to challenging established AI chip makers.
- Provides a direct-control programming model covering training, RL, prefill, and decode without upper limits on model size
Editorial Opinion
MatX One demonstrates sophisticated engineering that goes beyond chasing raw compute numbers—the designers have explicitly optimized for LLM workloads rather than pursuing generalist performance. This specialized approach is increasingly vindicated as frontier labs demand purpose-built silicon for training and serving massive models at scale. With backing from leading investors and technical leaders like Nat Friedman and Daniel Gross, MatX enters a competitive but growing market where alternatives to NVIDIA's GPU dominance are increasingly viable.



