r/MachineLearning: preflight Adds a 10-Check PyTorch Gate Before Training Starts

A small tool aimed at the most expensive kind of ML failure

On March 15, 2026, r/MachineLearning surfaced a post about preflight that reached 56 points and 13 comments at crawl time. The backstory is familiar to anyone who has trained models at scale: the run did not crash, the code technically worked, and only days later did it become clear that the model had learned nothing. In this case the author says the culprit was label leakage between train and validation, which led to building a CLI meant to catch silent failures before the expensive job begins.

The GitHub README frames preflight as a quick gate you run with a command such as preflight run --dataloader my_dataloader.py. It performs 10 checks across FATAL, WARN, and INFO severity tiers, and exits with code 1 if any FATAL check fails. The published list includes nan_inf_detection, label_leakage, shape_mismatch, gradient_check, normalisation_sanity, channel_ordering, vram_estimation, class_imbalance, split_sizes, and duplicate_samples. The README also shows JSON output, GitHub Actions usage, and optional model or loss inputs to unlock shape and gradient checks.

What makes the project interesting is how tightly it scopes itself. The author explicitly says preflight is not trying to replace pytest, Deepchecks, Great Expectations, WandB, MLflow, or PyTorch Lightning sanity checks. It is targeting the narrow but painful gap between code that runs and training that actually makes sense. That gap matters because many data and pipeline bugs never throw an exception. NaNs, leaking splits, channel order mismatches, dead gradients, or major class imbalance can quietly burn through compute budgets before anyone notices.

The setup is intentionally lightweight. Thresholds can be configured in .preflight.toml, individual checks can be disabled, and the roadmap mentions future auto-fix support, dataset drift comparison, and dry-run extensions. The tool is still early at v0.1.x, but the community response makes sense: there is real appetite for a fast, low-friction layer that raises a minimum safety bar before a long PyTorch job gets access to the GPU.

Primary source: preflight GitHub repository. Community discussion: r/MachineLearning.

r/MachineLearning: preflight Adds a 10-Check PyTorch Gate Before Training Starts

A small tool aimed at the most expensive kind of ML failure

Related Articles

MachineLearning Post Introduces preflight, a PyTorch CLI for Catching Silent Training Failures Before GPUs Burn Time

r/MachineLearning Didn't Buy the Hype Around Rose, but It Did Find the Idea Interesting

LMSYS posts Day-0 DeepSeek-V4 speeds up to 266 tok/s on H200

Comments (0)

Leave a Comment

Related Articles

MachineLearning Post Introduces preflight, a PyTorch CLI for Catching Silent Training Failures Before GPUs Burn Time
AI Reddit Mar 16, 2026 2 min read

r/MachineLearning Didn't Buy the Hype Around Rose, but It Did Find the Idea Interesting

LMSYS posts Day-0 DeepSeek-V4 speeds up to 266 tok/s on H200