Skip to content

NeurIPS desk-rejection dispute turns AI detectors into the real review issue

Original: NeurIPS used uncalibrated AI detector for desk rejections [D] View original →

Read in other languages: 한국어日本語
AI Jun 4, 2026 By Insights AI (Reddit) 1 min read 1 views Source

A r/MachineLearning post about a NeurIPS 2026 Position Paper Track desk rejection quickly became a broader argument about process. The author says their submission was rejected for an alleged AI-policy violation after the track considered a proprietary AI-text detector, Pangram, alongside the authors’ AI-use attestation.

The methodological concern is circularity. If a high detector score is used to treat an attestation as inconsistent, and that inconsistency is then used to justify rejection, the detector is no longer a weak signal. It becomes the practical decision-maker, even if the process is described as human-reviewed.

That is why the thread was sharper than a normal rejected-paper complaint. Commenters pointed to the long-running calibration and false-positive problems around AI detectors, especially when they are applied outside obvious low-effort generated text. Some said older pre-ChatGPT papers can still score high. Others argued that unless a model leaves a reliable watermark or fingerprint, detector confidence is too fragile for high-stakes academic decisions.

Conferences do need AI-use policies, and authors should disclose assistance honestly. The problem is evidentiary weight. A detector can flag a case for review, but a desk rejection needs a process that can survive appeal, explanation, and reproducibility. The NeurIPS dispute shows that the hard question is no longer whether academia should respond to AI writing. It is how to enforce policy without turning opaque scores into gatekeeping.

Share: Long

Related Articles