LocalLLaMA asks the obvious question: if LLMs think in vectors, why show words?

The appeal of this LocalLLaMA thread is that it sounds naive for about five seconds and then becomes a real systems question. The original post asks why LLM reasoning is usually exposed as language-based chain-of-thought when the model’s internal computation already lives in high-dimensional vectors. If vectors are where the math actually happens, why not let the model “think” there and only translate the final answer back into words? In a subreddit full of people who spend time on inference tricks and architecture papers, that was enough to trigger a serious 140-comment discussion rather than a quick dismissal.

The answers were revealing. One cluster of replies was blunt: nobody really knows how to train this well yet. Commenters pointed to ideas like COCONUT and to JEPA-style directions, but the repeated warning was that latent-space reasoning is a moving target. The latent space is not a fixed symbolic board you can write scratch work onto. It changes as the model trains, which makes supervision hard and can force awkward choices like freezing parts of the network. Other commenters pushed an even more basic question: what would the dataset even look like? Natural-language reasoning can at least be collected, scored, filtered, and re-used. A hidden vector scratchpad is much harder to label on purpose.

The thread also pushed back on a common intuition trap. Chain-of-thought is not simply a human-readable transcript of invisible internal thinking. In deployed models it often functions as extra tokens that re-enter context and shape the next generation step. That makes language useful for more than explainability. It gives the model a legible working memory that humans can audit, debug, or constrain. Community discussion noted that once reasoning becomes fully latent, you gain compression and maybe speed, but lose a lot of visibility exactly where people care most: math, programming, legal logic, and safety-sensitive tasks. One commenter put the alignment angle plainly: labs prefer readable reasoning because opaque “neuralese” is harder to verify and harder to trust.

That is why the thread did not end with “vector good, words bad.” The more grounded conclusion was that current systems pay an efficiency tax for inspectability, and many practitioners still think that is a trade worth making. LocalLLaMA clearly sees latent reasoning as an interesting research path, not a solved engineering switch waiting to be flipped. The excitement in the thread came from recognizing that the question is deeper than user interface. It touches training stability, supervision, evaluation, and whether future reasoning models should optimize first for compressed internal thought or for reasoning that can still be checked in public.

LocalLLaMA asks the obvious question: if LLMs think in vectors, why show words?

Related Articles

HN thread spotlights a simple self-distillation recipe for stronger code generation

Reddit Spotlights Stanford's Open CS25 Transformers Course for Spring 2026

HN is stress-testing I-DLM, a diffusion LLM that says it can keep AR quality

Comments (0)

Leave a Comment

Related Articles

HN thread spotlights a simple self-distillation recipe for stronger code generation
LLM Hacker News Apr 5, 2026 2 min read

Reddit Spotlights Stanford's Open CS25 Transformers Course for Spring 2026
LLM Reddit Apr 3, 2026 2 min read

HN is stress-testing I-DLM, a diffusion LLM that says it can keep AR quality
LLM Hacker News Apr 15, 2026 2 min read