Hugging Face Launches Storage Buckets: Mutable Object Storage for ML Workflows at $12/TB
Key Takeaways
- ▸Hugging Face Storage Buckets provide mutable, S3-like object storage optimized for ML production workflows that generate frequent intermediate files unsuitable for traditional Git versioning
- ▸Xet's chunk-based deduplication architecture reduces storage costs and bandwidth for related ML artifacts like successive model checkpoints and processed datasets
- ▸Pre-warming functionality enables data locality optimization across AWS and GCP regions, improving throughput for distributed training and multi-region pipelines
Summary
Hugging Face has introduced Storage Buckets, a new mutable, non-versioned object storage service designed specifically for machine learning production workflows. Unlike traditional Git-based versioning systems, Buckets are built to handle the constant stream of intermediate files generated during ML pipelines—including checkpoints, optimizer states, processed data shards, and logs—that frequently change and rarely require version control. The service is priced at $12 per terabyte and is backed by Xet, Hugging Face's chunk-based storage backend that enables efficient deduplication across related ML artifacts.
The architecture leverages Xet's content deduplication capabilities, which breaks files into chunks and identifies shared content across successive checkpoints and processed datasets. This approach significantly reduces bandwidth usage and storage costs, particularly valuable for enterprise customers who are billed based on deduplicated storage. Storage Buckets also introduce pre-warming functionality in partnership with AWS and GCP, allowing users to bring hot data closer to compute resources in specific cloud regions and providers, improving throughput for distributed training and large-scale pipelines.
Users can get started with Buckets in under two minutes using the Hugging Face CLI, with commands to create private buckets, sync local directories, and preview changes before execution. The service is integrated into the Hugging Face Hub, offering browsable storage pages, Python scripting support, and standard Hugging Face permissions management for both private and public buckets.
- Simple CLI interface allows users to create and manage buckets in under two minutes with built-in sync, dry-run, and plan-based deployment capabilities
Editorial Opinion
Storage Buckets addresses a genuine pain point in ML infrastructure—the mismatch between Git's version-control paradigm and the needs of ephemeral, evolving artifacts in training pipelines. The integration of Xet's deduplication is clever engineering that could meaningfully reduce costs for large-scale ML operations, though the real test will be adoption and whether the pricing model proves competitive against established cloud storage solutions. The pre-warming feature shows thoughtful design for distributed training scenarios, though the initial limitation to AWS and GCP may constrain appeal for teams invested in other cloud providers.



