Hugging Face adds Storage Buckets for mutable ML artifacts on the Hub

Original: Introducing Storage Buckets on the Hugging Face Hub View original →

Read in other languages: 한국어日本語
AI Mar 15, 2026 By Insights AI 2 min read 2 views Source

Hugging Face introduced Storage Buckets on March 10, 2026 as a new storage layer for machine learning teams that need something more flexible than Git-backed model or dataset repositories. The product is aimed squarely at mutable artifacts: checkpoints, optimizer states, processed shards, logs, traces, and other intermediate files that are constantly rewritten during training or pipeline execution. Instead of forcing those workloads through version control, Hugging Face is offering a non-versioned, S3-like object store that still lives natively inside the Hub.

The design goal is practical rather than flashy. A Bucket can be public or private, inherits standard Hub permissions, can be browsed in a web page, and can be addressed with handles such as hf://buckets/username/my-training-bucket. Hugging Face says this model is better suited to production ML than Git when clusters are writing many related files in parallel, when data pipelines repeatedly overwrite outputs, or when agents store traces, memory, and shared knowledge graphs. In other words, Buckets separate the fast mutable layer of ML work from the curated, versioned layer used for final publication.

The more interesting architectural piece is that Buckets are built on Xet, Hugging Face’s chunk-based backend. Rather than storing every file as an isolated blob, Xet deduplicates repeated chunks across similar files. That matters in ML because successive checkpoints, raw and processed datasets, and agent traces often share large amounts of content. Hugging Face says deduplication reduces bandwidth use, speeds transfers, and lowers the billed footprint for Enterprise customers because billing is based on deduplicated storage. The company is also adding pre-warming so hot data can be brought closer to compute, starting with AWS and GCP, which should reduce cross-region data movement for distributed training and large pipelines.

Hugging Face is clearly trying to make Buckets easy to drop into existing workflows. The feature is exposed through the hf CLI, through Python in huggingface_hub since v1.5.0, through JavaScript in @huggingface/hub since v2.10.5, and through HfFileSystem for fsspec-compatible tools such as pandas, Polars, and Dask. The company says Buckets are included in existing storage plans and thanked Jasper, Arcee, IBM, and PixAI as launch partners. Taken together, the release looks like an attempt to make the Hub not just a publishing endpoint for finished artifacts, but a working storage layer for the messy middle of AI development.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.