Hugging Face adds Storage Buckets to the Hub for mutable ML artifacts
Original: Introducing Storage Buckets on the Hugging Face Hub View original →
Hugging Face announced Storage Buckets for the Hub on March 10, 2026. The feature is designed for the parts of machine learning workflows that do not fit cleanly into versioned model or dataset repositories, including checkpoints, optimizer states, processed shards, logs, traces, and other intermediate artifacts that change frequently.
Why Hugging Face built it
Hugging Face argues that Git-based repositories are a good fit for publishing final artifacts, but not for high-churn storage that is overwritten repeatedly by multiple jobs. Storage Buckets are meant to fill that gap with mutable, non-versioned, S3-like object storage that can live under a user or organization namespace and be addressed programmatically through paths such as hf://buckets/username/my-training-bucket.
The feature is built on Xet, Hugging Face’s chunk-based storage backend. According to the company, Xet breaks files into chunks and deduplicates across them, so uploads of similar datasets or successive checkpoints can skip bytes that already exist. Hugging Face says that reduces bandwidth use, speeds up transfers, and improves storage efficiency. For Enterprise customers, billing is also based on deduplicated storage.
Operational features
Storage Buckets also introduces pre-warming so that frequently accessed data can be brought closer to the cloud region where compute runs. Hugging Face says AWS and GCP are supported at launch, which makes the feature relevant for distributed training and large-scale pipelines where storage location affects throughput. The platform is wired into the hf CLI, Python support in huggingface_hub, JavaScript support in @huggingface/hub, and fsspec-compatible filesystem access.
That combination makes Buckets more than a place to dump files. It is an attempt to keep working artifacts, training jobs, data processing, and eventual publication inside one Hub-native workflow. Hugging Face also says direct movement between Buckets and versioned repos is on the roadmap, which would connect transient working storage to final publishing destinations.
Why it matters
This is an infrastructure release, but it is strategically important because it expands Hugging Face from model hosting into storage substrate for ML operations. The company explicitly names agent traces, memory, and shared knowledge graphs as examples, which signals that it sees modern LLM application state as part of the storage problem. If Buckets performs as advertised, teams that already use the Hub may have less reason to split operational artifacts across separate object stores and publishing platforms.
The feature is included in existing Hub storage plans, but adoption will likely depend on transfer performance at scale, how well pre-warming works in practice, and whether teams find the Hub-native workflow simpler than combining Git repos with external object storage.
Source: Hugging Face
Related Articles
Hugging Face introduced Storage Buckets on March 10, 2026 as non-versioned, S3-like storage for checkpoints, processed data, logs, and agent traces. The feature is built on Xet deduplication and includes pre-warming for AWS and GCP to move hot data closer to compute.
OpenAI and Pacific Northwest National Laboratory introduced DraftNEPABench, a benchmark for permitting work that uses coding agents on document-heavy NEPA tasks. In tests across 18 federal agencies, experts found the system could save 1 to 5 hours per subsection, or roughly 15% of drafting time.
OpenAI said it raised $110B at a $730B pre-money valuation and added Amazon and NVIDIA to a broader infrastructure push. The company tied the financing to fast growth across Codex, ChatGPT, and enterprise deployments.
Comments (0)
No comments yet. Be the first to comment!