#orthrus - Insights

LLM Hacker News May 16, 2026 1 min read

Orthrus-Qwen3 Delivers 7.8× Faster Inference With Identical Output

The Orthrus framework achieves up to 7.8× tokens per forward pass on Qwen3 models while maintaining a provably identical output distribution to the original. Its dual-view architecture shares a single KV cache between autoregressive and diffusion pathways.

#inference #qwen3 #speculative-decoding