Subquadratic Launches SubQ: The First Fully Subquadratic LLM with 12M-Token Context and Linear Compute Scaling

Key Takeaways

▸SubQ is the first LLM with a fully subquadratic architecture where compute scales linearly with context length, reducing attention compute by ~1,000x at 12 million tokens
▸Achieves 95% accuracy on RULER 128K benchmark and 65.9 on MRCR v2, outperforming Claude Opus 4.7 and other frontier models on long-context reasoning tasks
▸Sparse attention is 52x faster than FlashAttention while using 63% less compute—solving the historical trade-off between speed and efficiency

Source:

Hacker Newshttps://subq.ai/introducing-subq↗

Summary

Subquadratic, a newly launched AI company, has unveiled SubQ 1M-Preview, the first large language model built on a fully subquadratic architecture where compute scales linearly—rather than quadratically—with context length. The breakthrough addresses a fundamental limitation that has constrained transformer-based systems since their inception: as context windows grow, computational requirements have historically scaled quadratically, forcing developers to build expensive workarounds like RAG systems and retrieval pipelines. SubQ's architecture reduces attention compute by nearly 1,000x compared to frontier models, enabling practical 12-million-token context windows while maintaining competitive accuracy.

The model demonstrates frontier-level performance across multiple benchmarks. On the RULER 128K benchmark, SubQ 1M-Preview achieves 95% accuracy compared to Claude Opus 4.6's 94.8%. On MRCR v2, which tests retrieval and reasoning over distributed information, SubQ scores 65.9 (third-party verified) versus Claude Opus 4.7's 32.2 and Gemini 3.1 Pro's 26.3. The sparse attention architecture is 52x faster than FlashAttention while requiring 63% less compute—a combination of efficiency gains historically incompatible in transformer systems.

Subquadratic is launching three products in private beta starting today: a full-context API for developers and enterprises, SubQ Code (a CLI-based coding agent that loads entire codebases into a single context window), and SubQ Search (a long-context search tool with deep research capabilities). The company positions these offerings as alternatives to multi-agent systems and complex retrieval orchestration, potentially reducing both the architectural complexity and operational costs of building long-context AI applications.

Three products launching in private beta: API for developers/enterprises, SubQ Code (CLI agent for full-codebase analysis), and SubQ Search (long-context research tool)
Could fundamentally change how developers build long-context applications by eliminating the need for RAG systems, retrieval pipelines, and chunking strategies

Editorial Opinion

SubQ's fully subquadratic architecture represents a potential inflection point in LLM design, finally breaking the quadratic scaling constraint that has defined practical limits since the transformer's inception. The combination of 12M-token context, frontier-level accuracy, and dramatic efficiency gains (1,000x reduction in attention compute) suggests Subquadratic has cracked a problem the industry has spent years working around. That said, benchmark performance—particularly on proprietary tests—doesn't always translate to production reliability, and the true value will depend on whether these efficiency gains materialize in real-world applications and whether the private beta progresses to broader availability.

Subquadratic Launches SubQ: The First Fully Subquadratic LLM with 12M-Token Context and Linear Compute Scaling

Key Takeaways

▸SubQ is the first LLM with a fully subquadratic architecture where compute scales linearly with context length, reducing attention compute by ~1,000x at 12 million tokens
▸Achieves 95% accuracy on RULER 128K benchmark and 65.9 on MRCR v2, outperforming Claude Opus 4.7 and other frontier models on long-context reasoning tasks
▸Sparse attention is 52x faster than FlashAttention while using 63% less compute—solving the historical trade-off between speed and efficiency

Summary

Three products launching in private beta: API for developers/enterprises, SubQ Code (CLI agent for full-codebase analysis), and SubQ Search (long-context research tool)
Could fundamentally change how developers build long-context applications by eliminating the need for RAG systems, retrieval pipelines, and chunking strategies

Editorial Opinion

SubQ's fully subquadratic architecture represents a potential inflection point in LLM design, finally breaking the quadratic scaling constraint that has defined practical limits since the transformer's inception. The combination of 12M-token context, frontier-level accuracy, and dramatic efficiency gains (1,000x reduction in attention compute) suggests Subquadratic has cracked a problem the industry has spent years working around. That said, benchmark performance—particularly on proprietary tests—doesn't always translate to production reliability, and the true value will depend on whether these efficiency gains materialize in real-world applications and whether the private beta progresses to broader availability.

Subquadratic Launches SubQ: The First Fully Subquadratic LLM with 12M-Token Context and Linear Compute Scaling

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Anthropic Releases Prempti: Open-Source Guardrails for AI Coding Agents

mm-ctx: Open-Source Multimodal CLI Toolkit Brings Vision Capabilities to AI Agents

Anthropic Unleashes Computer Use: Claude 3.5 Sonnet Now Controls Your Desktop

Subquadratic Launches SubQ: The First Fully Subquadratic LLM with 12M-Token Context and Linear Compute Scaling

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

Anthropic Releases Prempti: Open-Source Guardrails for AI Coding Agents

mm-ctx: Open-Source Multimodal CLI Toolkit Brings Vision Capabilities to AI Agents

Anthropic Unleashes Computer Use: Claude 3.5 Sonnet Now Controls Your Desktop