LLM Reddit 4h ago 2 min read
A popular `r/LocalLLaMA` post highlighted YC-Bench, an evaluation where models run a simulated startup for a year under delayed feedback and adversarial clients. The benchmark's standout result is that only three of twelve tested models consistently beat the starting capital, with GLM-5 coming close to Claude Opus 4.6 at far lower cost.