#computer-vision

AI 3d ago 2 min read

Vision Banana turns image generators into all-purpose vision models

This paper argues that image generators may be turning into the vision equivalent of large language models. DeepMind says Vision Banana, built on Nano Banana Pro, beats or rivals specialist systems such as Segment Anything and Depth Anything on 2D and 3D tasks after lightweight instruction tuning.

#google-deepmind #computer-vision #vision-banana

AI sources.twitter Mar 31, 2026 2 min read

Meta releases SAM 3.1 with object multiplexing for faster multi-object video tracking

Meta said on March 27, 2026 that SAM 3.1 is a drop-in update to SAM 3 that improves video processing efficiency through object multiplexing. The project's release notes say the update introduces shared-memory joint multi-object tracking, new checkpoints, and about 7x speedup at 128 objects on a single H100 compared with the November 2025 SAM 3 release.

#meta #sam3 #computer-vision

AI Mar 28, 2026 2 min read

Meta ships SAM 3.1 with object multiplexing for 32 FPS video tracking on a single H100

Meta introduced SAM 3.1 on March 27, 2026 as a drop-in upgrade for real-time video detection and tracking. The company says object multiplexing lets the model track up to 16 objects in one forward pass and doubles throughput from 16 to 32 FPS on a single H100 for medium-object-count videos.

#meta #computer-vision #video

Humanoid Robots Mar 23, 2026 2 min read

Google DeepMind unveils D4RT for 4D scene reconstruction up to 300x more efficiently

Google DeepMind introduced D4RT on January 22, 2026 as a unified model for dynamic 4D scene reconstruction and tracking. The company says it runs 18x to 300x faster than prior methods and is efficient enough for real-time applications in robotics and augmented reality.

#deepmind #robotics #computer-vision

AI Reddit Mar 22, 2026 2 min read

Michael Hafftka opens 50 years of work as a Hugging Face dataset

A post on r/artificial drew attention to painter Michael Hafftka publishing his catalog raisonne as an open dataset on Hugging Face. The dataset card lists roughly 3,780 works, structured metadata, and a CC-BY-NC-4.0 license.

#datasets #computer-vision #art

Humanoid Robots Reddit Mar 19, 2026 2 min read

r/artificial: Pokémon Go’s image corpus is now helping delivery robots localize on sidewalks

A March 16, 2026 r/artificial post linking a Popular Science report reached 590 points and 62 comments. The story says Niantic Spatial trained its Visual Positioning System on more than 30 billion Pokémon Go images and is now partnering with Coco Robotics so delivery robots can localize with centimeter-level precision in GPS-challenged streets.

#robotics #computer-vision #crowdsourcing

AI Hacker News Mar 10, 2026 2 min read

LoGeR Pushes Feedforward 3D Reconstruction to 19,000-Frame Videos

A Hacker News discussion highlighted LoGeR, a Google DeepMind and UC Berkeley project that uses hybrid memory to scale dense 3D reconstruction across extremely long videos without post-hoc optimization.

#computer-vision #3d-reconstruction #long-context

AI Reddit Mar 9, 2026 2 min read

VeridisQuo combines spatial and frequency cues for explainable deepfake detection

Highlighted in r/MachineLearning, VeridisQuo fuses an EfficientNet-B4 spatial stream with FFT and DCT frequency features, then uses GradCAM remapping to show which facial regions triggered a deepfake prediction.

#deepfake #computer-vision #efficientnet

AI Reddit Mar 7, 2026 2 min read

Reddit Project Watch: VeridisQuo Combines EfficientNet, FFT, and DCT for Explainable Deepfake Detection

A well-received r/MachineLearning post introduced VeridisQuo, an open-source deepfake detector that fuses spatial and frequency-domain signals and overlays GradCAM heatmaps onto manipulated video frames. The project stands out because the author shared concrete architecture and training details instead of just a demo clip.

#deepfake-detection #computer-vision #explainable-ai

AI Feb 16, 2026 1 min read

Google DeepMind Introduces D4RT, Unifying 4D Scene Reconstruction and Tracking from 2D Video

Google DeepMind introduced D4RT, a single model framework for dynamic 4D scene reconstruction and tracking. The company reports up to 300x efficiency gains versus prior methods, highlighting real-time potential for robotics and AR workloads.

#computer-vision #robotics #transformer