VeridisQuo combines spatial and frequency cues for explainable deepfake detection
Original: [P] VeridisQuo - open-source deepfake detector that combines spatial + frequency analysis and shows you where the face was manipulated View original →
VeridisQuo is a student-built deepfake detection project that tries to make video forensics more explainable, not just more accurate. The core premise is that many detectors focus mainly on pixel-level visual features, while generated media also leaves traces in the frequency domain through compression artifacts and spectral inconsistencies. VeridisQuo therefore combines a standard spatial vision backbone with dedicated frequency analysis and then visualizes where the model believes manipulation is happening.
According to the README and the r/MachineLearning write-up, the spatial branch uses an ImageNet-pretrained EfficientNet-B4 to produce a 1,792-dimensional representation. The frequency branch computes both FFT and DCT features from each cropped face image, producing two 512-dimensional vectors that are fused into a 1,024-dimensional representation by a small MLP. Those streams are concatenated into a 2,816-dimensional input for the final classifier. The full model is reported at about 25.05 million parameters.
- The model operates on 224x224 RGB face crops.
- The training data is based on FaceForensics++ (C23) and a preprocessed dataset of roughly 716,438 face images.
- The preprocessing pipeline uses 1 FPS frame extraction, YOLOv11n face detection, and padded face crops.
- GradCAM heatmaps are remapped back onto the original video to show suspected manipulation regions.
The explainability layer is what makes the release notable. Deepfake detectors often look strong on curated benchmarks but are hard to trust in deployment because users cannot see whether the model is responding to true manipulation artifacts or to accidental shortcuts. By projecting GradCAM signals back onto the source frames, VeridisQuo gives researchers at least one way to inspect whether attention is landing on blend boundaries, jaw regions, and other facial areas that plausibly correlate with generated edits.
The authors also shared limitations instead of overselling the result. They reported roughly 96% accuracy on the held-out test split and a false-positive rate around 7-8%, but also noted that random real-world videos still skew too often toward “FAKE.” That admission is important because it acknowledges the usual generalization gap between benchmark evaluation and open-world use. For a university project, that level of transparency makes the release more useful to the community.
The community post is on r/MachineLearning. The original materials are available in the GitHub repository and the Hugging Face demo.
Related Articles
A well-received r/MachineLearning post introduced VeridisQuo, an open-source deepfake detector that fuses spatial and frequency-domain signals and overlays GradCAM heatmaps onto manipulated video frames. The project stands out because the author shared concrete architecture and training details instead of just a demo clip.
A Hacker News discussion highlighted LoGeR, a Google DeepMind and UC Berkeley project that uses hybrid memory to scale dense 3D reconstruction across extremely long videos without post-hoc optimization.
xAI said on Feb 2, 2026 that SpaceX had acquired xAI in a stock transaction valuing xAI at $200 billion and SpaceX at $350 billion. The announcement links xAI's model roadmap with infrastructure milestones that include a $20 billion Series E, Colossus at 200,000 GPUs, and a stated path to 1 million GPUs by the end of 2026.
Comments (0)
No comments yet. Be the first to comment!