Hacker News Highlights Leanstral, Mistral's Open Lean 4 Agent for Verified Coding
Original: Leanstral: Open-source agent for trustworthy coding and formal proof engineering View original →
A coding agent built for Lean 4, not just code completion
On March 16, 2026, Hacker News pushed Mistral's Leanstral announcement to 277 points and 49 comments at crawl time. What made the post stand out is that Leanstral is not positioned as a general coding chatbot. Mistral describes it as an open agent for Lean 4 proof engineering, aimed at the part of software and mathematics work where implementation is not enough and formal verification becomes the real bottleneck. The company's framing is that human review, not raw code generation, is now the slowest step in high-stakes engineering.
According to the launch post, Leanstral uses a sparse architecture with 6B active parameters and is released under an Apache 2.0 license. Mistral is not only shipping weights: it is exposing the model in Mistral Vibe, offering a free API endpoint called labs-leanstral-2603, and explicitly supporting MCP workflows. The post also says the model was trained to work especially well with lean-lsp-mcp, which matters because the story here is less about one-shot code generation and more about operating inside a verifiable tool loop.
The evaluation section is also why the HN thread has technical substance. Mistral introduces FLTEval, a benchmark based on completing real formal repository pull requests instead of isolated olympiad-style problems. In the published table, Leanstral pass@2 reaches a score of 26.3, ahead of Claude Sonnet 4.6 at 23.7, while the listed run cost is $36 versus Sonnet's $549. At pass@16, Leanstral reaches 31.9 while still staying far below Opus-level cost. Mistral also compares the model against larger open-weight systems such as GLM5, Kimi, and Qwen, arguing that the smaller active-parameter footprint still scales efficiently on this kind of proof workload.
That is why the community reaction matters. Discussion around coding agents is shifting away from whether a model can emit code and toward whether it can produce work that is inspectable, reproducible, and checkable. Leanstral is one of the clearer attempts to push the open-model ecosystem toward verified implementations instead of raw vibe-coding throughput. Even if the benchmark claims will need broader validation, the release gives the community a concrete artifact to test in real Lean repositories.
Primary source: Mistral Leanstral announcement. Community discussion: Hacker News.
Related Articles
METR's March 10, 2026 note argues that about half of test-passing SWE-bench Verified PRs from recent agents would still be rejected by maintainers. HN treated it as a warning that benchmark wins do not yet measure scope control, code quality, or repo fit.
OmniCoder-9B packages agent-style coding behavior into a smaller open model by training on more than 425,000 curated trajectories from real tool-using workflows.
A LocalLLaMA release post presents OmniCoder-9B as a Qwen3.5-9B-based coding agent fine-tuned on 425,000-plus agentic trajectories, with commenters focusing on its read-before-write behavior and usefulness at small model size.
Comments (0)
No comments yet. Be the first to comment!