r/MachineLearning: preflight Adds a 10-Check PyTorch Gate Before Training Starts

Original: [P] preflight, a pre-training validator for PyTorch I built after losing 3 days to label leakage View original →

Read in other languages: 한국어日本語
AI Mar 17, 2026 By Insights AI (Reddit) 2 min read 1 views Source

A small tool aimed at the most expensive kind of ML failure

On March 15, 2026, r/MachineLearning surfaced a post about preflight that reached 56 points and 13 comments at crawl time. The backstory is familiar to anyone who has trained models at scale: the run did not crash, the code technically worked, and only days later did it become clear that the model had learned nothing. In this case the author says the culprit was label leakage between train and validation, which led to building a CLI meant to catch silent failures before the expensive job begins.

The GitHub README frames preflight as a quick gate you run with a command such as preflight run --dataloader my_dataloader.py. It performs 10 checks across FATAL, WARN, and INFO severity tiers, and exits with code 1 if any FATAL check fails. The published list includes nan_inf_detection, label_leakage, shape_mismatch, gradient_check, normalisation_sanity, channel_ordering, vram_estimation, class_imbalance, split_sizes, and duplicate_samples. The README also shows JSON output, GitHub Actions usage, and optional model or loss inputs to unlock shape and gradient checks.

What makes the project interesting is how tightly it scopes itself. The author explicitly says preflight is not trying to replace pytest, Deepchecks, Great Expectations, WandB, MLflow, or PyTorch Lightning sanity checks. It is targeting the narrow but painful gap between code that runs and training that actually makes sense. That gap matters because many data and pipeline bugs never throw an exception. NaNs, leaking splits, channel order mismatches, dead gradients, or major class imbalance can quietly burn through compute budgets before anyone notices.

The setup is intentionally lightweight. Thresholds can be configured in .preflight.toml, individual checks can be disabled, and the roadmap mentions future auto-fix support, dataset drift comparison, and dry-run extensions. The tool is still early at v0.1.x, but the community response makes sense: there is real appetite for a fast, low-friction layer that raises a minimum safety bar before a long PyTorch job gets access to the GPU.

Primary source: preflight GitHub repository. Community discussion: r/MachineLearning.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.