Ornith-1.0 tests the open-model bar for agentic coding

Ornith-1.0 arrived as a set of open models aimed directly at agentic coding. The project README lists a 9B dense model plus 35B and 397B mixture-of-experts checkpoints, post-trained on Gemma 4 and Qwen 3.5 bases. It also emphasizes an MIT license, global availability, and deployment recipes for vLLM, SGLang, Transformers, llama.cpp, and Ollama-style local use.

The headline numbers are coding-agent benchmarks. The README compares the models across Terminal-Bench 2.1, SWE-bench Verified, SWE-bench Pro, SWE-bench Multilingual, NL2Repo, and ClawEval under stated harness settings. That gave HN enough material to debate the release, but the better discussion was about practical behavior: whether smaller open coding models now feel useful inside real development loops.

Several commenters focused on the 35B variant. Early users reported running quantized or FP8 versions locally, with one comparing it favorably to Qwen 3.6 35B-style models because it produced shorter reasoning traces and avoided some long loops. Other comments were more skeptical, asking who DeepReinforce is, whether the model is essentially a Qwen derivative, and what “self-improving” means outside the training framework.

That mix is the real signal. Open coding models are no longer judged only by a SWE-bench row. Developers want released weights, usable serving instructions, long context, tool-call parsing, reasoning separation, and enough speed to sit inside an agent loop without turning every task into a long wait. Ornith-1.0 is interesting because it packages those claims in one release, while still leaving provenance and replication questions for the community to test.

Source: Ornith-1.0 README, HN discussion.

Ornith-1.0 tests the open-model bar for agentic coding

Related Articles

GLM-5.2 turns 1M context into a coding-agent benchmark fight

OCR model competition is moving toward ingestion quality

OpenRouter Benchmarks API lets agents query live model rankings