LocalLLaMA jumped on DeepSeek's visual-primitives idea, then watched the repo vanish

LocalLLaMA jumped on DeepSeek's Thinking with Visual Primitives post for two reasons at once: the underlying idea looked genuinely important, and then the repo disappeared fast enough to turn the thread into a small archival scramble.

According to the Reddit write-up, the framework was released by DeepSeek with collaborators at Peking University and Tsinghua University. The core move is simple to describe and unusually concrete: instead of keeping image reasoning fully in natural language, the model can interleave coordinate points and bounding boxes into its chain of thought as explicit spatial tokens. In other words, the model is not only describing what it thinks it sees. It can point during reasoning. That matters because multimodal systems often fail at reference precision. They talk around an object instead of grounding attention on the exact region that matters.

The paper mirror and repo link gave commenters enough to see why the idea landed. Several users framed it as the kind of mechanism frontier labs have likely been using internally, but that open-model communities rarely get to inspect in detail. One high-voted reaction called it a big deal for open models because it replaces vague verbal scaffolding with a minimal visual language the model can manipulate. Another thread kept circling the practical upside: if points and boxes become first-class reasoning units, tasks such as counting, locating, or multi-step spatial comparison may depend less on prose that drifts away from the image.

Then came the second half of the drama. The Reddit post notes that DeepSeek removed the repository shortly after release, and commenters quickly traded mirror links and jokes about how familiar that release pattern already feels. The disappearance amplified the thread rather than killing it. In communities like LocalLLaMA, a deleted repo is not just scarcity theater. It is also a signal to preserve the artifact before it vanishes behind a cleanup pass or internal review.

That combination is why the post traveled. The community did not just see another multimodal paper. It saw a rare, inspectable attempt to make visual grounding part of the model's actual reasoning loop, then watched the window half-close in real time.

LocalLLaMA jumped on DeepSeek's visual-primitives idea, then watched the repo vanish

Related Articles

Google DeepMind Opens Gemma 4 for Agentic and Multimodal Local AI

HN Spots the Real DeepSeek V4 Story: The Docs Link Was Thin, but the Weights Were Already Live

Why LocalLLaMA treated DeepEP V2 and TileKernels as more than just another infra drop

Comments (0)

Leave a Comment

Related Articles

Google DeepMind Opens Gemma 4 for Agentic and Multimodal Local AI
LLM Hacker News Apr 2, 2026 2 min read

HN Spots the Real DeepSeek V4 Story: The Docs Link Was Thin, but the Weights Were Already Live

Why LocalLLaMA treated DeepEP V2 and TileKernels as more than just another infra drop
LLM Reddit Apr 24, 2026 2 min read