BotBeat
...
← Back

> ▌

DeepSeekDeepSeek
RESEARCHDeepSeek2026-05-15

DeepSeek V4 Pro and Flash Positioned Between Kimi and Claude in Independent Benchmark Test

Key Takeaways

  • ▸DeepSeek V4 Pro scores 77/100 in independent benchmark, positioning between Claude Opus 4.7 (91) and Kimi K2.6 (68), with aggressive 75% discount available through May 31, 2026
  • ▸DeepSeek V4 Flash achieves unprecedented cost efficiency at $0.02 output token cost, roughly 1/89th of Claude Opus 4.7
  • ▸Both DeepSeek models demonstrate strong architectural understanding but fail in complex infrastructure scenarios, particularly with lease expiry validation and database management
Source:
Hacker Newshttps://blog.kilo.ai/p/we-tested-deepseek-v4-pro-and-flash↗

Summary

Independent testing of DeepSeek's newly launched V4 Pro and Flash models reveals competitive positioning in the large language model landscape. DeepSeek V4 Pro, released on April 24, 2026 under MIT license, achieved a score of 77/100 for $2.25 in a sophisticated benchmark test, positioning it between Claude Opus 4.7 (91) and Kimi K2.6 (68). DeepSeek V4 Flash, the lightweight model in the new two-tier lineup, scored 60/100 for just $0.02, offering unprecedented price-per-token value—output tokens cost less than 1/14th of Kimi and 1/89th of Claude Opus. Additionally, DeepSeek is offering a 75% discount on V4 Pro through May 31, 2026, and has permanently reduced input cache pricing across its lineup by 90%, significantly improving cost efficiency for enterprise use cases.

The test used a FlowGraph specification—a complex workflow orchestration backend with 20 endpoints, persistent state, lease management, retries, and event streaming—to evaluate models under realistic infrastructure demands rather than typical lightweight benchmarks. Both DeepSeek models were tested in thinking mode against the same prompt and scoring rubric used for the Claude Opus 4.7 vs Kimi K2.6 comparison. The testing revealed that while DeepSeek V4 Pro demonstrated strong architectural understanding and reasonable project structure, both models exhibited implementation flaws that prevented fully passing builds and test suites.

DeepSeek V4 Pro passed its own test suite but encountered TypeScript build failures, while V4 Flash's test suite never executed due to database reset errors in the setup script. Detailed code review and targeted reproduction testing identified common issues with both models: lease expiry handling, scheduling logic, validation, and build integrity. These findings suggest systematic challenges for models handling complex stateful systems, comparable to similar issues observed with Kimi K2.6.

  • Permanent 90% reduction in input cache pricing across DeepSeek's lineup improves overall cost positioning for enterprise applications
  • Infrastructure-level testing using FlowGraph orchestration revealed implementation gaps not visible in simpler benchmarks, demonstrating the importance of rigorous real-world scenario validation

Editorial Opinion

DeepSeek's V4 lineup represents a significant price-performance breakthrough, particularly for cost-sensitive applications through V4 Flash. However, the benchmark results suggest that competitive performance at scale requires more than architectural parity—it demands rigorous infrastructure-level validation and proper implementation of stateful system semantics. The finding that both DeepSeek models struggled with lease management and workflow orchestration highlights an often-overlooked challenge in production AI systems: models must handle not just isolated reasoning tasks, but the operational complexity of distributed systems.

Large Language Models (LLMs)Generative AIMarket TrendsOpen Source

More from DeepSeek

DeepSeekDeepSeek
RESEARCH

Huawei's Ascend Chips Successfully Enable DeepSeek-V4-Pro Post-Training, Advancing China's AI Self-Reliance

2026-06-19
DeepSeekDeepSeek
INDUSTRY REPORT

Open-Source AI Dramatically Narrows Capability Gap: From 10 Months Behind to Just 2-3.5 Months

2026-06-18
DeepSeekDeepSeek
RESEARCH

DeepSeek Completes Full-Parameter Post-Training of V4-Pro on Huawei's Ascend 910C Chips

2026-06-17

Comments

Suggested

Z.aiZ.ai
PRODUCT LAUNCH

Z.ai Launches GLM-5.2, Claims Fable 5-Class Model Coming Within Months

2026-06-20
Moebius Research ProjectMoebius Research Project
RESEARCH

Moebius: Lightweight Image Inpainting Framework Achieves 10B-Level Quality with Just 0.2B Parameters

2026-06-20
InceptionInception
PRODUCT LAUNCH

Inception Unveils Mercury 2: Parallel-Token Diffusion Models Reshape LLM Performance Economics

2026-06-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us