LocalLLaMA dissects RYS II and repeated-layer gains in Qwen3.5-27B
Original: RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language' View original →
A March 23, 2026 post in r/LocalLLaMA, with 376 upvotes and 61 comments, turned David Noel Ng’s new RYS II write-up into one of the community’s busiest architecture threads of the day. The post revisits the idea that repeating carefully chosen middle transformer layers can improve capability without changing model weights, this time on Qwen3.5-27B.
The blog has two hooks. The first is scientific: hidden-state comparisons across English and Chinese inputs suggest that the middle layers align around content more than surface language, supporting a “universal language” or format-agnostic reasoning space. The second is practical: after a full scan, 3,024 beam-search candidates, and a surrogate model that ranked 2 million configurations, the clean winners were still contiguous mid-stack repeats. On the final shared validation sets, repeating layer 33 alone gave most of the EQ gain at only 1.5625% overhead, while larger blocks such as 31-33, 30-34, and 26-33 pushed performance further with diminishing returns.
- Ng published four FP8 model variants on HuggingFace: S (+1 layer), M (+3), L (+5), and XL (+8).
- The write-up says the Pareto frontier stayed with contiguous blocks even after testing sparse repeats, multi-block beam search, and surrogate-ranked candidates.
- A future ExLlama v3 format could keep duplicated layers as pointers, reducing VRAM growth to mainly compute and KV cache overhead.
LocalLLaMA cared because the work speaks directly to open-weight users. It suggests a path to measurable gains that does not start with expensive full fine-tuning and does not depend on closed APIs. At the same time, the post is careful not to oversell: composition helps, but gains are sublinear, and the efficient frontier matters more than the biggest raw score.
Primary source: RYS II blog post. Community discussion: LocalLLaMA.
Related Articles
A few weeks after release, r/LocalLLaMA is converging on task-specific sampler and reasoning-budget presets for Qwen3.5 rather than one default setup.
A popular r/LocalLLaMA post highlighted a community merge of uncensored and reasoning-distilled Qwen 3.5 9B checkpoints, underscoring the appetite for behavior-tuned small local models.
A new r/LocalLLaMA thread argues that NVIDIA's Nemotron-Cascade-2-30B-A3B deserves more attention after quick local coding evals came in stronger than expected. The post is interesting because it lines up community measurements with NVIDIA's own push for a reasoning-oriented open MoE model that keeps activated parameters low.
Comments (0)
No comments yet. Be the first to comment!