Skip to content
Decaying

VeridisQuo combines spatial and frequency cues for explainable deepfake detection

Original: [P] VeridisQuo - open-source deepfake detector that combines spatial + frequency analysis and shows you where the face was manipulated View original →

AI Mar 9, 2026 By Insights AI (Reddit) 2 min read 52 views Source
This article is not available in your selected language. Showing the original version.

VeridisQuo is a student-built deepfake detection project that tries to make video forensics more explainable, not just more accurate. The core premise is that many detectors focus mainly on pixel-level visual features, while generated media also leaves traces in the frequency domain through compression artifacts and spectral inconsistencies. VeridisQuo therefore combines a standard spatial vision backbone with dedicated frequency analysis and then visualizes where the model believes manipulation is happening.

According to the README and the r/MachineLearning write-up, the spatial branch uses an ImageNet-pretrained EfficientNet-B4 to produce a 1,792-dimensional representation. The frequency branch computes both FFT and DCT features from each cropped face image, producing two 512-dimensional vectors that are fused into a 1,024-dimensional representation by a small MLP. Those streams are concatenated into a 2,816-dimensional input for the final classifier. The full model is reported at about 25.05 million parameters.

  • The model operates on 224x224 RGB face crops.
  • The training data is based on FaceForensics++ (C23) and a preprocessed dataset of roughly 716,438 face images.
  • The preprocessing pipeline uses 1 FPS frame extraction, YOLOv11n face detection, and padded face crops.
  • GradCAM heatmaps are remapped back onto the original video to show suspected manipulation regions.

The explainability layer is what makes the release notable. Deepfake detectors often look strong on curated benchmarks but are hard to trust in deployment because users cannot see whether the model is responding to true manipulation artifacts or to accidental shortcuts. By projecting GradCAM signals back onto the source frames, VeridisQuo gives researchers at least one way to inspect whether attention is landing on blend boundaries, jaw regions, and other facial areas that plausibly correlate with generated edits.

The authors also shared limitations instead of overselling the result. They reported roughly 96% accuracy on the held-out test split and a false-positive rate around 7-8%, but also noted that random real-world videos still skew too often toward “FAKE.” That admission is important because it acknowledges the usual generalization gap between benchmark evaluation and open-world use. For a university project, that level of transparency makes the release more useful to the community.

The community post is on r/MachineLearning. The original materials are available in the GitHub repository and the Hugging Face demo.

Share: Long

Related Articles

AI Reddit Mar 7, 2026 1 min read

r/MachineLearning에서 호응을 얻은 VeridisQuo는 spatial signal과 frequency-domain signal을 결합하고, 조작된 비디오 프레임 위에 GradCAM heatmap을 입히는 오픈소스 deepfake detector다. 단순 데모 영상이 아니라 구체적인 architecture와 training details를 공개했다는 점이 눈에 띈다.