BotBeat
...
← Back

> ▌

CloudflareCloudflare
PRODUCT LAUNCHCloudflare2026-03-20

Cloudflare Workers AI Launches Large Language Model Inference with Moonshot AI's Kimi K2.5

Key Takeaways

  • ▸Workers AI now serves frontier open-source LLMs, starting with Moonshot AI's Kimi K2.5, closing a critical gap in Cloudflare's agent development platform
  • ▸Kimi K2.5's 256k context window and advanced agentic capabilities (tool calling, vision, structured outputs) enable production-grade AI agent deployments
  • ▸Real-world cost savings of 77% compared to proprietary models demonstrate that open-source frontier models are becoming the primary lever for enterprise AI scale
Source:
Hacker Newshttps://blog.cloudflare.com/workers-ai-large-models/↗

Summary

Cloudflare has announced that its Workers AI platform now supports frontier-scale large language models, starting with Moonshot AI's Kimi K2.5. This marks a significant expansion of Workers AI beyond smaller models, enabling developers to build and deploy complete AI agents on a unified platform. Kimi K2.5 features a 256k context window, multi-turn tool calling, vision inputs, and structured outputs—capabilities essential for agentic tasks.

The move addresses a critical gap in Cloudflare's agent-building infrastructure. While the company previously offered execution primitives like Durable Objects, Workflows, and the Agents SDK, agents still required external model providers. By bringing frontier-class models directly into the Developer Platform, Cloudflare enables end-to-end agent development without switching between services.

Cloudflare's internal testing demonstrates compelling economics. A security review agent processing 7 billion tokens daily using Kimi K2.5 costs 77% less than equivalent inference on mid-tier proprietary models—potentially saving $2.4 million annually for a single use case. As enterprises scale personal and coding agents across their organizations, cost-efficient open-source alternatives like Kimi are becoming the primary driver of adoption, shifting the industry away from proprietary model dependency.

  • Cloudflare has upgraded its inference stack to support very large LLMs, enabling serverless endpoints for personal agents and dedicated instances for enterprise autonomous systems

Editorial Opinion

Cloudflare's move to serve frontier open-source models represents a watershed moment for the AI infrastructure market. By integrating Kimi K2.5 directly into a unified developer platform with proven agentic primitives, Cloudflare is positioning itself as a serious alternative to cloud giants for AI workloads—particularly for cost-conscious enterprises. The 77% cost savings are not marginal; they reshape the economics of AI deployment at scale. As organizations move from experimental AI to production agents running continuously, the ability to build, deploy, and run agents on a single platform with favorable unit economics will become a decisive competitive advantage.

Large Language Models (LLMs)Generative AIAI AgentsMLOps & InfrastructurePartnershipsProduct Launch

More from Cloudflare

CloudflareCloudflare
RESEARCH

Cloudflare Rethinking Cache Architecture for AI-Driven Traffic Era

2026-04-02
CloudflareCloudflare
PRODUCT LAUNCH

Cloudflare's Workers AI Enters Large Model Inference Market With Moonshot AI's Kimi K2.5

2026-04-02
CloudflareCloudflare
PRODUCT LAUNCH

Cloudflare Slashes AI Agent Token Costs by 98% With RFC 9457-Compliant Error Responses

2026-03-27

Comments

Suggested

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us