LocalLLaMA dissects RYS II and repeated-layer gains in Qwen3.5-27B

A March 23, 2026 post in r/LocalLLaMA, with 376 upvotes and 61 comments, turned David Noel Ng’s new RYS II write-up into one of the community’s busiest architecture threads of the day. The post revisits the idea that repeating carefully chosen middle transformer layers can improve capability without changing model weights, this time on Qwen3.5-27B.

The blog has two hooks. The first is scientific: hidden-state comparisons across English and Chinese inputs suggest that the middle layers align around content more than surface language, supporting a “universal language” or format-agnostic reasoning space. The second is practical: after a full scan, 3,024 beam-search candidates, and a surrogate model that ranked 2 million configurations, the clean winners were still contiguous mid-stack repeats. On the final shared validation sets, repeating layer 33 alone gave most of the EQ gain at only 1.5625% overhead, while larger blocks such as 31-33, 30-34, and 26-33 pushed performance further with diminishing returns.

Ng published four FP8 model variants on HuggingFace: S (+1 layer), M (+3), L (+5), and XL (+8).
The write-up says the Pareto frontier stayed with contiguous blocks even after testing sparse repeats, multi-block beam search, and surrogate-ranked candidates.
A future ExLlama v3 format could keep duplicated layers as pointers, reducing VRAM growth to mainly compute and KV cache overhead.

LocalLLaMA cared because the work speaks directly to open-weight users. It suggests a path to measurable gains that does not start with expensive full fine-tuning and does not depend on closed APIs. At the same time, the post is careful not to oversell: composition helps, but gains are sublinear, and the efficient frontier matters more than the biggest raw score.

Primary source: RYS II blog post. Community discussion: LocalLLaMA.

LocalLLaMA dissects RYS II and repeated-layer gains in Qwen3.5-27B

Related Articles

LocalLLaMA Jumps on Qwen3.6-27B: 27B Dense Model, 262K Context

95.7% SimpleQA on a Single RTX 3090: Qwen3.6-27B with Agentic Search

Qwen 3.6 27B Achieves 2.5x Faster Local Inference via MTP With 262k Context on 48GB

Comments (0)

Leave a Comment

Related Articles

LocalLLaMA Jumps on Qwen3.6-27B: 27B Dense Model, 262K Context
LLM Reddit Apr 23, 2026 1 min read

95.7% SimpleQA on a Single RTX 3090: Qwen3.6-27B with Agentic Search
LLM Reddit May 3, 2026 1 min read

Qwen 3.6 27B Achieves 2.5x Faster Local Inference via MTP With 262k Context on 48GB