JitAPI: New MCP Server Reduces Token Usage by 34x When Integrating APIs with Claude
Key Takeaways
- ▸JitAPI reduces token consumption by 34x compared to loading full API specifications, addressing a major inefficiency when integrating complex APIs like GitHub (800+ endpoints) or Stripe (300+ endpoints)
- ▸Semantic search and dependency graph technology intelligently identify only the endpoints needed for each task, eliminating hallucinations caused by context overload
- ▸Multi-API orchestration enables Claude to chain requests across different APIs in a single query, unlocking new possibilities for complex workflows
Summary
JitAPI, a new Model Context Protocol (MCP) server, enables Claude to dynamically discover and interact with any API without requiring developers to manually load entire OpenAPI specifications into context. Instead of dumping hundreds or thousands of endpoints into Claude's context window—which wastes tokens and causes hallucinations—JitAPI uses semantic search and dependency graph analysis to surface only the relevant endpoints needed for a given task. The tool automatically identifies endpoint dependencies, resolves them in the correct order, and allows Claude to execute API calls with a 34x reduction in token usage compared to traditional approaches.
JitAPI supports multi-API orchestration, allowing developers to register multiple APIs and ask questions that span across them seamlessly. For example, users can chain calls across TMDB and OpenWeatherMap APIs in a single query. The system works out of the box with local embeddings, requiring no API keys, and supports cloud embedding providers like Voyage for enhanced search quality on larger APIs. Setup is straightforward—users simply add JitAPI to their Claude configuration file and register OpenAPI specs via URL.
- Zero-setup deployment with local embeddings out of the box, no API keys required, making it immediately accessible to developers
Editorial Opinion
JitAPI represents a practical solution to a real pain point in AI-assisted API integration—the context window bottleneck. By intelligently filtering endpoints rather than loading entire specifications, it not only reduces costs but also improves reasoning quality. This is the kind of incremental innovation that makes AI agents more practical and deployable at scale.


