Argonne National Laboratory Launches Private AI Inference Service on Spare Supercomputing Capacity
Key Takeaways
- ▸Argonne National Laboratory is repurposing idle supercomputing capacity to provide secure, private AI inference access to the US research community
- ▸The service aggregates multiple LLMs and custom models through a unified chatbot interface, eliminating the need for researchers to build and maintain their own AI infrastructure
- ▸Real-world applications already in deployment include real-time plasma disruption prediction and automated data filtering from particle physics experiments
Summary
The Department of Energy's Argonne National Laboratory unveiled a new AI inference service built on spare supercomputing capacity, providing researchers across the US with secure access to large language models without exposing data to public services like ChatGPT. The service currently runs on two systems: Sophia, featuring 192 Nvidia A100 GPUs with 40GB memory, and Metis, equipped with 32 SambaNova SN40L AI accelerators. It will be extended to include Nvidia GH200-based Tara and B200-based Minerva systems.
The inference service provides access to multiple models including OpenAI's GPT-OSS, Google's Gemma, Meta's Llama, and custom domain-specific models like AuroraGPT, delivered through a chatbot-style web interface. Researchers are already leveraging the service to analyze experimental data in real time, predict plasma disruptions in fusion energy research, and filter massive datasets from particle accelerators and telescopes to identify likely candidates. The service enables researchers to apply AI at scale to their work while making better use of available supercomputing resources, addressing both the need for AI access and the critical requirement for data privacy in sensitive research contexts.
- By keeping AI inference on-premise, researchers can experiment with generative AI while maintaining strict data privacy, addressing concerns about exposing sensitive research to public cloud services
Editorial Opinion
This is a pragmatic approach to democratizing AI access in the research community while maintaining strict data privacy and computational efficiency. By leveraging existing supercomputing infrastructure that would otherwise sit idle, Argonne solves two problems simultaneously: underutilized compute capacity and researcher demand for secure AI tools. The real-world applications already in use—from predicting plasma disruptions to accelerating particle physics discovery—demonstrate that AI's value in research extends far beyond text generation into genuine scientific acceleration. This model could become a blueprint for other national labs and institutions seeking to provide AI services without the privacy, cost, and security concerns of relying on public cloud platforms.


