LocalLLaMA jumped on DeepSeek's visual-primitives idea, then watched the repo vanish

LocalLLaMA jumped on DeepSeek's Thinking with Visual Primitives post for two reasons at once: the underlying idea looked genuinely important, and then the repo disappeared fast enough to turn the thread into a small archival scramble.

According to the Reddit write-up, the framework was released by DeepSeek with collaborators at Peking University and Tsinghua University. The core move is simple to describe and unusually concrete: instead of keeping image reasoning fully in natural language, the model can interleave coordinate points and bounding boxes into its chain of thought as explicit spatial tokens. In other words, the model is not only describing what it thinks it sees. It can point during reasoning. That matters because multimodal systems often fail at reference precision. They talk around an object instead of grounding attention on the exact region that matters.

The paper mirror and repo link gave commenters enough to see why the idea landed. Several users framed it as the kind of mechanism frontier labs have likely been using internally, but that open-model communities rarely get to inspect in detail. One high-voted reaction called it a big deal for open models because it replaces vague verbal scaffolding with a minimal visual language the model can manipulate. Another thread kept circling the practical upside: if points and boxes become first-class reasoning units, tasks such as counting, locating, or multi-step spatial comparison may depend less on prose that drifts away from the image.

Then came the second half of the drama. The Reddit post notes that DeepSeek removed the repository shortly after release, and commenters quickly traded mirror links and jokes about how familiar that release pattern already feels. The disappearance amplified the thread rather than killing it. In communities like LocalLLaMA, a deleted repo is not just scarcity theater. It is also a signal to preserve the artifact before it vanishes behind a cleanup pass or internal review.

That combination is why the post traveled. The community did not just see another multimodal paper. It saw a rare, inspectable attempt to make visual grounding part of the model's actual reasoning loop, then watched the window half-close in real time.

LocalLLaMA jumped on DeepSeek's visual-primitives idea, then watched the repo vanish

Related Articles

DeepSeek V4-Pro makes its 75% API price cut permanent

Gemma 4 12B removes separate encoders for laptop-scale multimodal AI

DiffusionGemma cuts the token bottleneck with a 26B open model