LocalLLaMA Revisits RYS on Qwen3.5 and the Case for a Shared Reasoning Space

r/LocalLLaMA amplified David Noel Ng's LLM Neuroanatomy II because it combines model hacking with a specific empirical claim: repeating blocks in the middle of a transformer still seems to help on modern open models. The post revisits RYS, or Repeat Your Self, on Qwen3.5-27B and argues that relayering was not just a one-off trick on older Qwen2 checkpoints.

The blog says Ng explored the space with 3,024 beam search candidates, a surrogate model that scored 2 million configurations, and a unified validation sweep before releasing new RYS variants. That matters because the community has seen plenty of merge culture and Frankenstein models before. What stood out here was the attempt to turn layer duplication into a more systematic search problem rather than a lucky recipe.

A stronger claim than just bigger models

The post also makes a more interesting representational claim. Ng shows multilingual hidden-state comparisons where, in the middle of the network, cross-language pairs with the same content stay more similar than same-language pairs with different content. In the blog's framing, that suggests a format-agnostic reasoning space and hints at a universal language inside the model. That does not prove that LLMs literally think in one language, but it does offer a concrete measurement that readers can argue about instead of a vague metaphor.

The Reddit summary tied those observations to a practical release: several RYS-Qwen3.5-27B-FP8 variants on Hugging Face, plus the claim that fine-tuning repeated-layer variants could push the size class further. It also noted an unresolved systems issue. Repeating layers currently increases memory footprint, and Ng says he is looking at formats where duplicated layers can stay as copies without extra VRAM apart from the KV cache.

Comments reflected both enthusiasm and caution. Readers praised the rigor of the search and the hidden-state analysis, asked for more languages and more model families, and drew comparisons to earlier layer-merge experiments from the Llama 2 era. That mix captures why the thread mattered. RYS II is not just another model feels smarter claim. It is an attempt to connect architecture edits, multilingual representation geometry, and practical open-weight releases in a way that the open-model community can reproduce, challenge, and extend.

LocalLLaMA Revisits RYS on Qwen3.5 and the Case for a Shared Reasoning Space

A stronger claim than just bigger models

Related Articles

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp

Qwen3.7-Max Joins the Frontier: Matches GPT 5.4 on Artificial Analysis Rankings

Qwen3.6 35B Transforms Workflows Through Skill-Based Prompting

Related Articles

110 tok/s on a 35B Model with 12GB VRAM Using ik_llama.cpp
LLM Reddit May 22, 2026 1 min read

Qwen3.7-Max Joins the Frontier: Matches GPT 5.4 on Artificial Analysis Rankings
LLM Hacker News May 20, 2026 1 min read

Qwen3.6 35B Transforms Workflows Through Skill-Based Prompting
LLM Reddit May 22, 2026 1 min read