LLM X/Twitter 3h ago 2 min read DeepSWE’s 113 tasks put GPT-5.5 at 70% and Claude Opus 4.7 at 54% DeepSWE reframes coding-agent evaluation with 113 original tasks across 91 repositories. Its first board gives GPT-5.5 a 70.0% pass@1 score, versus 54.2% for Claude Opus 4.7. #deepswe#coding-agents#benchmark 1