Ornith-1.0、agentic coding向けopen modelの実用ラインを試す

Ornith-1.0は、agentic coding向けのopen model群として公開された。READMEは9B dense、35B MoE、397B MoEのcheckpointを掲げ、Gemma 4とQwen 3.5系の上でpost-trainingしたと説明する。MIT license、地域制限なしの利用、vLLMやSGLang、Transformers、llama.cpp向けの利用手順も前面に出している。

目を引くのはcoding agent benchmarkだ。Terminal-Bench 2.1、SWE-bench Verified、SWE-bench Pro、SWE-bench Multilingual、NL2Repo、ClawEvalなどで、harness条件を明記した比較表が並ぶ。HNで話題になったのは当然だが、議論の中心は表の数字そのものより、実際の開発作業に入れたときの手触りだった。

コメントでは35B版への反応が目立った。量子化版やFP8版をlocal環境で試した利用者から、Qwen 3.6 35B系より速く、thinking traceが短めで長いloopに入りにくいという評価が出た。一方で、DeepReinforceの実体、Qwen派生としての位置づけ、self-improvingがtraining framework外で何を意味するのか、といった確認すべき点も指摘されている。

ここにopen coding modelの今の評価軸がある。SWE-benchの1行だけでは足りない。公開weight、serving手順、長いcontext、tool-call parser、reasoning parser、local inference速度、過剰なreasoning loopを抑える性質までまとめて見られる。Ornith-1.0はその条件を一つのreleaseに詰め込んだ点で読む価値がある。

Source: Ornith-1.0 README, HN discussion.

LLM X/Twitter 4d ago 1 min read

OpenRouter Benchmarks API、エージェントが最新モデル順位を実行時に参照可能に

モデル選択は、静的leaderboardではなく実行時routingの問題になりつつある。OpenRouterはBenchmarks APIでArtificial AnalysisやDesign Arenaを含むlive scoreを取得でき、GLM-5.2がcodingとdesignで上位だと示した。

#openrouter #benchmarks #glm-5.2

LLM Jun 18, 2026 1 min read

GLM-5.2、1M contextをcoding agent競争の実戦指標へ

Z.AIはGLM-5.2を長文対応モデルではなく、長時間のcoding agent向けモデルとして位置づけた。1M lossless context、最大128K出力、Terminal-Bench 2.1の81.0点、FrontierSWEでClaude Opus 4.8に1%差という主張が焦点だ。

#zai #glm-5.2 #coding-agents

LLM 2d ago 1 min read

Open-weight modelの差は3〜6カ月、OpenRouterが4モデルで整理

OpenRouterは6月のopen-weight modelをDeepSeek V4 Flash、GLM 5.2、MiniMax M3、NVIDIA Nemotron 3 Ultraの4軸で整理した。79.0%のSWE-bench Verified、Intelligence Index 51、1M context、低いserving costが判断材料になる。

#openrouter #open-weight #llm

Related Articles

OpenRouter Benchmarks API、エージェントが最新モデル順位を実行時に参照可能に

GLM-5.2、1M contextをcoding agent競争の実戦指標へ

Open-weight modelの差は3〜6カ月、OpenRouterが4モデルで整理