Sciences X/Twitter 4h ago 1 min read
Biology agents are being judged on research judgment, not just factual answers. GeneBench-Pro puts 129 computational-biology problems in front of agents, and indexed coverage says GPT-5.6 Sol reaches 28.7% at the highest reasoning level and 31.5% in Pro mode.