Perplexity, Qwen SFT+RL로 GPT factuality 비용 곡선 추월 주장

tweet가 드러낸 점

Perplexity는 최신 model 작업을 chat style이 아니라 search quality로 설명했다. 핵심 quote는 Our SFT + RL pipeline improves search, citation quality, instruction following, and efficiency. With Qwen models, we match or beat GPT models on factuality at a lower cost. 이다.

Perplexity account는 AI search product release, app update, research note를 주로 올리는 공식 채널이다. 이 tweet가 material한 이유는 training recipe와 evaluation target, 비교 대상을 함께 적었기 때문이다. supervised fine-tuning과 reinforcement learning을 거친 Qwen model이 factuality와 cost 측면에서 GPT model과 경쟁한다는 주장이다.

왜 의미가 있나

search-augmented assistant의 실패는 일반 chat benchmark에서 잘 드러나지 않는다. 답변은 매끄럽지만 source가 약하거나, 새 문서를 놓치거나, 싼 query에도 비싼 model을 쓰는 문제가 생길 수 있다. Perplexity의 claim은 search behavior, citation quality, instruction following, efficiency라는 production 변수 네 가지를 동시에 겨냥한다.

FxTwitter metadata 기준으로 이 tweet에는 public paper, repo, blog URL이 붙어 있지 않고 media attachment만 확인된다. 따라서 결과는 Perplexity가 보고한 benchmark로 취급해야 하며, 독립 검증 전에는 method를 단정하기 어렵다. 그래도 signal은 분명하다. Qwen 계열 open model이 단순히 저렴한 inference backend가 아니라, closed GPT-class system과 factuality layer에서 경쟁할 수 있는 trainable search model로 포지셔닝되고 있다.

builder 관점의 다음 질문은 method다. 어떤 factuality dataset을 썼는지, citation은 human review인지 automatic check인지, 개선분이 retrieval policy에서 온 것인지 answer model fine-tuning에서 온 것인지가 중요하다. cost도 per query, per token, successful answer, latency target 중 무엇을 기준으로 했는지 확인해야 한다. 다음 관전점은 Perplexity의 technical write-up, model card, 혹은 실제 traffic routing 변경이다.

Source: X source tweet

Perplexity, Qwen SFT+RL로 GPT factuality 비용 곡선 추월 주장

tweet가 드러낸 점

왜 의미가 있나

Related Articles

Qwen3.6 GGUF 논쟁, r/LocalLLaMA는 “어떤 quant를 돌릴 것인가”로 내려갔다

Qwen3.6-Max-Preview, coding benchmark 상위권에도 cloud-only

Qwen3.6 79 t/s 글에서 r/LocalLLaMA가 본 진짜 변수: --n-cpu-moe

Comments (0)

Leave a Comment

Related Articles

Qwen3.6 GGUF 논쟁, r/LocalLLaMA는 “어떤 quant를 돌릴 것인가”로 내려갔다

Qwen3.6-Max-Preview, coding benchmark 상위권에도 cloud-only

Qwen3.6 79 t/s 글에서 r/LocalLLaMA가 본 진짜 변수: --n-cpu-moe
r/LocalLLaMA가 이 글에 반응한 이유는 숫자가 구체적이었기 때문이다: RTX 5070 Ti에서 128K context와 79 t/s를 만든 핵심이 flag 하나로 좁혀졌다.