r/MachineLearning: preflight Adds a 10-Check PyTorch Gate Before Training Starts
Original: [P] preflight, a pre-training validator for PyTorch I built after losing 3 days to label leakage View original →
A small tool aimed at the most expensive kind of ML failure
On March 15, 2026, r/MachineLearning surfaced a post about preflight that reached 56 points and 13 comments at crawl time. The backstory is familiar to anyone who has trained models at scale: the run did not crash, the code technically worked, and only days later did it become clear that the model had learned nothing. In this case the author says the culprit was label leakage between train and validation, which led to building a CLI meant to catch silent failures before the expensive job begins.
The GitHub README frames preflight as a quick gate you run with a command such as preflight run --dataloader my_dataloader.py. It performs 10 checks across FATAL, WARN, and INFO severity tiers, and exits with code 1 if any FATAL check fails. The published list includes nan_inf_detection, label_leakage, shape_mismatch, gradient_check, normalisation_sanity, channel_ordering, vram_estimation, class_imbalance, split_sizes, and duplicate_samples. The README also shows JSON output, GitHub Actions usage, and optional model or loss inputs to unlock shape and gradient checks.
What makes the project interesting is how tightly it scopes itself. The author explicitly says preflight is not trying to replace pytest, Deepchecks, Great Expectations, WandB, MLflow, or PyTorch Lightning sanity checks. It is targeting the narrow but painful gap between code that runs and training that actually makes sense. That gap matters because many data and pipeline bugs never throw an exception. NaNs, leaking splits, channel order mismatches, dead gradients, or major class imbalance can quietly burn through compute budgets before anyone notices.
The setup is intentionally lightweight. Thresholds can be configured in .preflight.toml, individual checks can be disabled, and the roadmap mentions future auto-fix support, dataset drift comparison, and dry-run extensions. The tool is still early at v0.1.x, but the community response makes sense: there is real appetite for a fast, low-friction layer that raises a minimum safety bar before a long PyTorch job gets access to the GPU.
Primary source: preflight GitHub repository. Community discussion: r/MachineLearning.
Related Articles
A March 15, 2026 r/MachineLearning post introduced preflight, a new PyTorch-oriented CLI that runs 10 pre-training checks such as label leakage, NaN detection, gradient checks, and VRAM estimation before a job starts.
An r/MachineLearning post introduced TraceML, an open-source tool that instruments PyTorch runs with a single context manager and surfaces timing, memory, and rank skew while training is still running. The pitch is practical observability rather than heavyweight profiling.
A March 15, 2026 r/MachineLearning post highlighted GraphZero, a C++ engine that memory-maps graph topology and features from SSD so large GNN datasets can stay off RAM.
Comments (0)
No comments yet. Be the first to comment!