LocalLLaMA Revisits RYS on Qwen3.5 and the Case for a Shared Reasoning Space
Original: RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language' View original →
r/LocalLLaMA amplified David Noel Ng's LLM Neuroanatomy II because it combines model hacking with a specific empirical claim: repeating blocks in the middle of a transformer still seems to help on modern open models. The post revisits RYS, or Repeat Your Self, on Qwen3.5-27B and argues that relayering was not just a one-off trick on older Qwen2 checkpoints.
The blog says Ng explored the space with 3,024 beam search candidates, a surrogate model that scored 2 million configurations, and a unified validation sweep before releasing new RYS variants. That matters because the community has seen plenty of merge culture and Frankenstein models before. What stood out here was the attempt to turn layer duplication into a more systematic search problem rather than a lucky recipe.
A stronger claim than just bigger models
The post also makes a more interesting representational claim. Ng shows multilingual hidden-state comparisons where, in the middle of the network, cross-language pairs with the same content stay more similar than same-language pairs with different content. In the blog's framing, that suggests a format-agnostic reasoning space and hints at a universal language inside the model. That does not prove that LLMs literally think in one language, but it does offer a concrete measurement that readers can argue about instead of a vague metaphor.
The Reddit summary tied those observations to a practical release: several RYS-Qwen3.5-27B-FP8 variants on Hugging Face, plus the claim that fine-tuning repeated-layer variants could push the size class further. It also noted an unresolved systems issue. Repeating layers currently increases memory footprint, and Ng says he is looking at formats where duplicated layers can stay as copies without extra VRAM apart from the KV cache.
Comments reflected both enthusiasm and caution. Readers praised the rigor of the search and the hidden-state analysis, asked for more languages and more model families, and drew comparisons to earlier layer-merge experiments from the Llama 2 era. That mix captures why the thread mattered. RYS II is not just another model feels smarter claim. It is an attempt to connect architecture edits, multilingual representation geometry, and practical open-weight releases in a way that the open-model community can reproduce, challenge, and extend.
Related Articles
A community user achieved 110 tokens/second running Qwen3.6 35B A3B on an RTX 4070 Super 12GB via ik_llama.cpp, a fork with superior CPU offload optimization that significantly outperforms upstream llama.cpp's Multi-Token Prediction implementation.
Alibaba's Qwen team has released Qwen3.7-Max, an agent-focused frontier LLM. It ranks 5th on Artificial Analysis's Intelligence Index, nearly matching GPT 5.4, and is available as both an API and open weights.
A viral LocalLLaMA post describes how Qwen3.6 35B A3B transformed complex workflows by combining Codex for task execution with skill documentation, feeding those skills to the pi agent — automating VPS management, PDF conversion, and more.