LocalLLaMA Revisits RYS on Qwen3.5 and the Case for a Shared Reasoning Space
Original: RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language' View original →
r/LocalLLaMA amplified David Noel Ng's LLM Neuroanatomy II because it combines model hacking with a specific empirical claim: repeating blocks in the middle of a transformer still seems to help on modern open models. The post revisits RYS, or Repeat Your Self, on Qwen3.5-27B and argues that relayering was not just a one-off trick on older Qwen2 checkpoints.
The blog says Ng explored the space with 3,024 beam search candidates, a surrogate model that scored 2 million configurations, and a unified validation sweep before releasing new RYS variants. That matters because the community has seen plenty of merge culture and Frankenstein models before. What stood out here was the attempt to turn layer duplication into a more systematic search problem rather than a lucky recipe.
A stronger claim than just bigger models
The post also makes a more interesting representational claim. Ng shows multilingual hidden-state comparisons where, in the middle of the network, cross-language pairs with the same content stay more similar than same-language pairs with different content. In the blog's framing, that suggests a format-agnostic reasoning space and hints at a universal language inside the model. That does not prove that LLMs literally think in one language, but it does offer a concrete measurement that readers can argue about instead of a vague metaphor.
The Reddit summary tied those observations to a practical release: several RYS-Qwen3.5-27B-FP8 variants on Hugging Face, plus the claim that fine-tuning repeated-layer variants could push the size class further. It also noted an unresolved systems issue. Repeating layers currently increases memory footprint, and Ng says he is looking at formats where duplicated layers can stay as copies without extra VRAM apart from the KV cache.
Comments reflected both enthusiasm and caution. Readers praised the rigor of the search and the hidden-state analysis, asked for more languages and more model families, and drew comparisons to earlier layer-merge experiments from the Llama 2 era. That mix captures why the thread mattered. RYS II is not just another model feels smarter claim. It is an attempt to connect architecture edits, multilingual representation geometry, and practical open-weight releases in a way that the open-model community can reproduce, challenge, and extend.
Related Articles
A few weeks after release, r/LocalLLaMA is converging on task-specific sampler and reasoning-budget presets for Qwen3.5 rather than one default setup.
A popular r/LocalLLaMA post highlighted a community merge of uncensored and reasoning-distilled Qwen 3.5 9B checkpoints, underscoring the appetite for behavior-tuned small local models.
A high-signal r/LocalLLaMA benchmark post said moving Qwen 3.5 27B from mainline llama.cpp to ik_llama.cpp raised prompt evaluation from about 43 tok/sec to 1,122 tok/sec on a Blackwell RTX PRO 4000, with generation climbing from 7.5 tok/sec to 26 tok/sec.
Comments (0)
No comments yet. Be the first to comment!