#qwen3

LLM Hacker News May 16, 2026 1 min read

Orthrus-Qwen3 Delivers 7.8× Faster Inference With Identical Output

The Orthrus framework achieves up to 7.8× tokens per forward pass on Qwen3 models while maintaining a provably identical output distribution to the original. Its dual-view architecture shares a single KV cache between autoregressive and diffusion pathways.

#inference #qwen3 #speculative-decoding

LLM Reddit May 4, 2026 1 min read

Llama.cpp Multi-Token Prediction Support Enters Beta, Closing the vLLM Performance Gap

llama.cpp's Multi-Token Prediction (MTP) support has entered beta, currently covering Qwen3.5 MTP. Combined with maturing tensor-parallel support, most token generation speed gaps between llama.cpp and vLLM are expected to close.

#llama-cpp #mtp #local-llm

LLM Apr 16, 2026 2 min read

Lightning OPD cuts reasoning-model post-training to 30 GPU hours

Lightning OPD attacks a practical bottleneck in on-policy distillation: keeping a live teacher model running throughout training. The paper reports 69.9% on AIME 2024 from Qwen3-8B-Base in 30 GPU hours, a 4.0x speedup over standard OPD.

#llm #distillation #post-training

LLM Reddit Apr 13, 2026 2 min read

r/LocalLLaMA tracks the llama.cpp merge that brings in Qwen3 audio support

A 54-point Reddit post flagged merged PR #19441 as the moment qwen3-omni-moe and qwen3-asr support reached llama.cpp, with commenters focused on local multimodal and ASR use cases.

#qwen3 #llama-cpp #audio

LLM Reddit Mar 15, 2026 2 min read

r/LocalLLaMA: StepFun Releases the SFT Dataset Behind Step 3.5 Flash

StepFun opened more than a model card by releasing the Step-3.5-Flash-SFT dataset on Hugging Face. The repo bundles raw JSON data, tokenizer snapshots, and StepTronOSS-oriented compiled shards, while the Reddit discussion focused on reproducibility, reasoning traces, and the implications of the dual-license setup.

#stepfun #sft #datasets

LLM Reddit Feb 23, 2026 1 min read

Qwen3's Hidden Gem: Voice Embeddings Enable Mathematical Voice Manipulation

Qwen3's TTS model encodes voices into 1024-dimensional vectors, enabling gender swapping, pitch adjustment, voice mixing, and semantic voice search through vector math — now available as a standalone lightweight encoder on HuggingFace.

#qwen3 #tts #voice-embeddings