Cloudflare Workers AI Launches Large Language Model Inference with Moonshot AI's Kimi K2.5

Key Takeaways

▸Workers AI now serves frontier open-source LLMs, starting with Moonshot AI's Kimi K2.5, closing a critical gap in Cloudflare's agent development platform
▸Kimi K2.5's 256k context window and advanced agentic capabilities (tool calling, vision, structured outputs) enable production-grade AI agent deployments
▸Real-world cost savings of 77% compared to proprietary models demonstrate that open-source frontier models are becoming the primary lever for enterprise AI scale

Source:

Hacker Newshttps://blog.cloudflare.com/workers-ai-large-models/↗

Summary

Cloudflare has announced that its Workers AI platform now supports frontier-scale large language models, starting with Moonshot AI's Kimi K2.5. This marks a significant expansion of Workers AI beyond smaller models, enabling developers to build and deploy complete AI agents on a unified platform. Kimi K2.5 features a 256k context window, multi-turn tool calling, vision inputs, and structured outputs—capabilities essential for agentic tasks.

The move addresses a critical gap in Cloudflare's agent-building infrastructure. While the company previously offered execution primitives like Durable Objects, Workflows, and the Agents SDK, agents still required external model providers. By bringing frontier-class models directly into the Developer Platform, Cloudflare enables end-to-end agent development without switching between services.

Cloudflare's internal testing demonstrates compelling economics. A security review agent processing 7 billion tokens daily using Kimi K2.5 costs 77% less than equivalent inference on mid-tier proprietary models—potentially saving $2.4 million annually for a single use case. As enterprises scale personal and coding agents across their organizations, cost-efficient open-source alternatives like Kimi are becoming the primary driver of adoption, shifting the industry away from proprietary model dependency.

Cloudflare has upgraded its inference stack to support very large LLMs, enabling serverless endpoints for personal agents and dedicated instances for enterprise autonomous systems

Editorial Opinion

Cloudflare's move to serve frontier open-source models represents a watershed moment for the AI infrastructure market. By integrating Kimi K2.5 directly into a unified developer platform with proven agentic primitives, Cloudflare is positioning itself as a serious alternative to cloud giants for AI workloads—particularly for cost-conscious enterprises. The 77% cost savings are not marginal; they reshape the economics of AI deployment at scale. As organizations move from experimental AI to production agents running continuously, the ability to build, deploy, and run agents on a single platform with favorable unit economics will become a decisive competitive advantage.

Cloudflare Workers AI Launches Large Language Model Inference with Moonshot AI's Kimi K2.5

Key Takeaways

▸Workers AI now serves frontier open-source LLMs, starting with Moonshot AI's Kimi K2.5, closing a critical gap in Cloudflare's agent development platform
▸Kimi K2.5's 256k context window and advanced agentic capabilities (tool calling, vision, structured outputs) enable production-grade AI agent deployments
▸Real-world cost savings of 77% compared to proprietary models demonstrate that open-source frontier models are becoming the primary lever for enterprise AI scale

Summary

Cloudflare has upgraded its inference stack to support very large LLMs, enabling serverless endpoints for personal agents and dedicated instances for enterprise autonomous systems

Editorial Opinion

Cloudflare's move to serve frontier open-source models represents a watershed moment for the AI infrastructure market. By integrating Kimi K2.5 directly into a unified developer platform with proven agentic primitives, Cloudflare is positioning itself as a serious alternative to cloud giants for AI workloads—particularly for cost-conscious enterprises. The 77% cost savings are not marginal; they reshape the economics of AI deployment at scale. As organizations move from experimental AI to production agents running continuously, the ability to build, deploy, and run agents on a single platform with favorable unit economics will become a decisive competitive advantage.

Cloudflare Workers AI Launches Large Language Model Inference with Moonshot AI's Kimi K2.5

Key Takeaways

Summary

Editorial Opinion

More from Cloudflare

Cloudflare Rebuilds Browser Run on Containers for 4x Better Performance and Scale

Cloudflare Cuts 1,100 Workers (20% of Staff) as AI Transforms Operations

Cloudflare Lays Off 1,100 Employees to Prepare for 'Agentic AI Era'

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says

Cloudflare Workers AI Launches Large Language Model Inference with Moonshot AI's Kimi K2.5

Key Takeaways

Summary

Editorial Opinion

More from Cloudflare

Cloudflare Rebuilds Browser Run on Containers for 4x Better Performance and Scale

Cloudflare Cuts 1,100 Workers (20% of Staff) as AI Transforms Operations

Cloudflare Lays Off 1,100 Employees to Prepare for 'Agentic AI Era'

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says