DeepInfra Raises $107M Series B to Scale AI Inference Infrastructure
Key Takeaways
- ▸DeepInfra secured $107M Series B funding led by 500 Global and Georges Harik, with backing from NVIDIA, Samsung Next, and other strategic investors
- ▸The company has grown token processing volume by 25x since Series A, reflecting strong market demand for specialized inference infrastructure as AI agents drive continuous workloads
- ▸DeepInfra operates a vertically integrated platform across 8 U.S. data centers, owning GPU hardware and optimization software to deliver superior cost and latency compared to general-purpose cloud providers
Summary
DeepInfra announced a $107 million Series B funding round co-led by 500 Global and investor Georges Harik, with participation from notable investors including NVIDIA, Samsung Next, Supermicro, and others. The funding will be used to scale DeepInfra's inference cloud platform and expand its global capacity, following 25x growth in token processing volume since the company's Series A. DeepInfra positions inference as the critical bottleneck in enterprise AI infrastructure, driven by the convergence of open-source models reaching parity with proprietary systems and the rise of agent-based systems that demand continuous, high-volume token generation. The company differentiates itself through a vertically integrated, full-stack approach—owning and operating GPU infrastructure across eight U.S. data centers with purpose-built networking and inference-optimized software. DeepInfra is collaborating with NVIDIA on inference optimization, with early deployments of Blackwell GPUs achieving up to 20x improvements in inference cost efficiency.
- Strategic collaboration with NVIDIA on Nemotron models and inference optimization is delivering up to 20x cost efficiency improvements using next-generation Blackwell GPUs

