LLM sources.twitter 3h ago 1 min read
Why it matters: search products need factuality and citations, not just fluent answers. Perplexity said its SFT + RL pipeline lets Qwen models match or beat GPT models on factuality at lower cost.
Why it matters: search products need factuality and citations, not just fluent answers. Perplexity said its SFT + RL pipeline lets Qwen models match or beat GPT models on factuality at lower cost.
Lightning OPD attacks a practical bottleneck in on-policy distillation: keeping a live teacher model running throughout training. The paper reports 69.9% on AIME 2024 from Qwen3-8B-Base in 30 GPU hours, a 4.0x speedup over standard OPD.