LLM Hacker News 2h ago 1 min read
The Orthrus framework achieves up to 7.8× tokens per forward pass on Qwen3 models while maintaining a provably identical output distribution to the original. Its dual-view architecture shares a single KV cache between autoregressive and diffusion pathways.