BotBeat
...
← Back

> ▌

Cactus ComputeCactus Compute
PRODUCT LAUNCHCactus Compute2026-05-12

Cactus Releases Needle: A 26M Parameter Function-Calling Model for Edge Devices

Key Takeaways

  • ▸Function-calling does not require large language models—the task is fundamentally retrieval-and-assembly (matching queries to tools and extracting arguments), not reasoning
  • ▸The 'no FFN' architecture generalizes to any task where models have access to external structured knowledge, suggesting efficiency gains in RAG and retrieval-augmented systems
  • ▸Needle runs at 6,000 tokens/sec prefill and 1,200 tokens/sec decode on consumer devices while outperforming models 10-13x larger on its specific task
Source:
Hacker Newshttps://github.com/cactus-compute/needle↗

Summary

Cactus Compute has open-sourced Needle, a 26-million parameter function-calling model designed to run efficiently on consumer devices like phones, watches, and glasses. The team distilled Gemini's tool-calling capability into a lightweight architecture based on "Simple Attention Networks" that uses only attention and gating mechanisms, eliminating feed-forward networks (FFNs). The model achieves impressive performance on consumer hardware, running at 6,000 tokens/sec prefill and 1,200 tokens/sec decode speed.

The training process involved pretraining on 200 billion tokens across 16 TPU v6e instances for 27 hours, followed by post-training on 2 billion tokens of synthesized function-calling data spanning 15 tool categories (timers, messaging, navigation, smart home, etc.). While Needle outperforms larger models like FunctionGemma-270M, Qwen-0.6B, and Granite-350M on single-shot function calling, those models offer greater scope for conversational tasks. Cactus has released the model weights and training code under the MIT license, along with a playground UI for testing and fine-tuning on custom tools.

  • The open-source release enables developers to fine-tune the model for custom tool integration directly on laptops and personal devices

Editorial Opinion

Needle represents an important recognition that not all AI tasks are created equal. By identifying function-calling as fundamentally different from reasoning, the Cactus team has built something genuinely useful for agentic AI on the devices where most people interact with technology—phones and watches. The finding that FFN parameters are largely wasted when external knowledge is provided could reshape efficient model design across many application domains.

AI AgentsAI HardwareOpen Source

Comments

Suggested

AnthropicAnthropic
OPEN SOURCE

Anthropic Releases Prempti: Open-Source Guardrails for AI Coding Agents

2026-05-12
vlm-runvlm-run
OPEN SOURCE

mm-ctx: Open-Source Multimodal CLI Toolkit Brings Vision Capabilities to AI Agents

2026-05-12
AnthropicAnthropic
PRODUCT LAUNCH

Anthropic Unleashes Computer Use: Claude 3.5 Sonnet Now Controls Your Desktop

2026-05-12
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us