Hugging Face adds Storage Buckets to the Hub for mutable ML artifacts

Original: Introducing Storage Buckets on the Hugging Face Hub View original →

Read in other languages: 한국어日本語
AI Mar 22, 2026 By Insights AI 2 min read Source

Hugging Face announced Storage Buckets for the Hub on March 10, 2026. The feature is designed for the parts of machine learning workflows that do not fit cleanly into versioned model or dataset repositories, including checkpoints, optimizer states, processed shards, logs, traces, and other intermediate artifacts that change frequently.

Why Hugging Face built it

Hugging Face argues that Git-based repositories are a good fit for publishing final artifacts, but not for high-churn storage that is overwritten repeatedly by multiple jobs. Storage Buckets are meant to fill that gap with mutable, non-versioned, S3-like object storage that can live under a user or organization namespace and be addressed programmatically through paths such as hf://buckets/username/my-training-bucket.

The feature is built on Xet, Hugging Face’s chunk-based storage backend. According to the company, Xet breaks files into chunks and deduplicates across them, so uploads of similar datasets or successive checkpoints can skip bytes that already exist. Hugging Face says that reduces bandwidth use, speeds up transfers, and improves storage efficiency. For Enterprise customers, billing is also based on deduplicated storage.

Operational features

Storage Buckets also introduces pre-warming so that frequently accessed data can be brought closer to the cloud region where compute runs. Hugging Face says AWS and GCP are supported at launch, which makes the feature relevant for distributed training and large-scale pipelines where storage location affects throughput. The platform is wired into the hf CLI, Python support in huggingface_hub, JavaScript support in @huggingface/hub, and fsspec-compatible filesystem access.

That combination makes Buckets more than a place to dump files. It is an attempt to keep working artifacts, training jobs, data processing, and eventual publication inside one Hub-native workflow. Hugging Face also says direct movement between Buckets and versioned repos is on the roadmap, which would connect transient working storage to final publishing destinations.

Why it matters

This is an infrastructure release, but it is strategically important because it expands Hugging Face from model hosting into storage substrate for ML operations. The company explicitly names agent traces, memory, and shared knowledge graphs as examples, which signals that it sees modern LLM application state as part of the storage problem. If Buckets performs as advertised, teams that already use the Hub may have less reason to split operational artifacts across separate object stores and publishing platforms.

The feature is included in existing Hub storage plans, but adoption will likely depend on transfer performance at scale, how well pre-warming works in practice, and whether teams find the Hub-native workflow simpler than combining Git repos with external object storage.

Source: Hugging Face

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.