Redditor Proves Discontinued Intel Optane Remains Viable for Trillion-Parameter LLM Inference

Key Takeaways

▸A 1 trillion-parameter LLM (Kimi K2.5) ran at ~4 tokens/second on 768GB of second-hand Intel Optane persistent memory paired with a single RTX 3060 GPU
▸Second-hand Optane DCPMM pricing dramatically undercut equivalent DRAM capacity while maintaining 2-3× latency advantage over NVMe SSDs
▸Hybrid CPU/GPU inference with strategic tensor overrides in llama.cpp enabled efficient trillion-parameter model execution despite hardware constraints

Source:

Hacker Newshttps://www.tomshardware.com/tech-industry/artificial-intelligence/enthusiast-runs-1-trillion-parameter-llm-from-768gb-of-intel-optane-dimm-memory-sticks-local-kimi-k2-5-install-achieved-roughly-4-tokens-per-second↗

Summary

A resourceful Redditor has demonstrated that Intel's discontinued Optane Persistent Memory DIMMs can serve as a cost-effective memory solution for running trillion-parameter large language models locally. Using six 128GB Optane DCPMM modules (768GB total) paired with a Xeon workstation and a single RTX 3060 GPU, APFrisco achieved approximately 4 tokens per second when running the Kimi K2.5 frontier-class model using llama.cpp with hybrid CPU/GPU inference and strategic router optimization.

The build leveraged Optane's unique position between DRAM and NVMe SSD speeds—offering substantially lower latency than storage while remaining more affordable than high-capacity DRAM. By sourcing discontinued modules second-hand, the builder achieved dramatic cost savings while maintaining practical inference performance for an exceptionally large model. The project highlights Intel's poor market timing: while Optane struggled commercially when launched, today's extreme memory costs make the discontinued technology genuinely competitive for niche workloads.

The achievement underscores a persistent industry need: the memory gap between expensive DRAM and slow storage has proven problematic for scaling LLM inference. Standards like CXL (Compute Express Link) are expected to address this more elegantly in the future with larger pools of affordable, byte-addressable memory. Until then, this build proves that creative engineering—combined with discontinued commodity hardware—can deliver surprising capabilities on tight budgets.

The proof-of-concept validates the existence of a real market gap between DRAM and storage for LLM workloads, reinforcing the importance of emerging standards like CXL

Editorial Opinion

Intel's Optane saga—from inflated promises to discontinued obsolescence—finally finds vindication in this corner of the internet. The irony is sharp: a technology Intel abandoned now proves essential for the AI era they arguably failed to anticipate. This DIY success challenges the notion that cutting-edge inference requires cutting-edge hardware; it suggests instead that creative engineering, optimization discipline, and willingness to work with yesterday's technology can unlock surprising performance. For a community obsessed with running large models locally, Optane is proving to be the memory sweet spot Intel never fully capitalized on.

Redditor Proves Discontinued Intel Optane Remains Viable for Trillion-Parameter LLM Inference

Key Takeaways

▸A 1 trillion-parameter LLM (Kimi K2.5) ran at ~4 tokens/second on 768GB of second-hand Intel Optane persistent memory paired with a single RTX 3060 GPU
▸Second-hand Optane DCPMM pricing dramatically undercut equivalent DRAM capacity while maintaining 2-3× latency advantage over NVMe SSDs
▸Hybrid CPU/GPU inference with strategic tensor overrides in llama.cpp enabled efficient trillion-parameter model execution despite hardware constraints

Summary

The proof-of-concept validates the existence of a real market gap between DRAM and storage for LLM workloads, reinforcing the importance of emerging standards like CXL

Editorial Opinion

Intel's Optane saga—from inflated promises to discontinued obsolescence—finally finds vindication in this corner of the internet. The irony is sharp: a technology Intel abandoned now proves essential for the AI era they arguably failed to anticipate. This DIY success challenges the notion that cutting-edge inference requires cutting-edge hardware; it suggests instead that creative engineering, optimization discipline, and willingness to work with yesterday's technology can unlock surprising performance. For a community obsessed with running large models locally, Optane is proving to be the memory sweet spot Intel never fully capitalized on.

Redditor Proves Discontinued Intel Optane Remains Viable for Trillion-Parameter LLM Inference

Key Takeaways

Summary

Editorial Opinion

More from Intel

Yann LeCun's AMI Labs Raises $1 Billion to Develop Post-LLM AI Architecture

Intelica Launches AI Agent-Ready Competitive Intelligence API with Blockchain Micropayments

AI Index Report 2026: Ninth Edition Documents Growing Gap Between AI Capability and Governance

Comments

Suggested

Maincode Launches Matilda, an End-to-End Australian AI Assistant in Open Beta

New Zealand's First AI Datacentre Faces Local Backlash Over Environmental and Transparency Concerns

OpenAI Launches Portable Desktop Robot: Battery-Powered AI Assistant with Humanlike Personality

Redditor Proves Discontinued Intel Optane Remains Viable for Trillion-Parameter LLM Inference

Key Takeaways

Summary

Editorial Opinion

More from Intel

Yann LeCun's AMI Labs Raises $1 Billion to Develop Post-LLM AI Architecture

Intelica Launches AI Agent-Ready Competitive Intelligence API with Blockchain Micropayments

AI Index Report 2026: Ninth Edition Documents Growing Gap Between AI Capability and Governance

Comments

Suggested

Maincode Launches Matilda, an End-to-End Australian AI Assistant in Open Beta

New Zealand's First AI Datacentre Faces Local Backlash Over Environmental and Transparency Concerns

OpenAI Launches Portable Desktop Robot: Battery-Powered AI Assistant with Humanlike Personality