Claude Opus 4.7 Beats NMR Software on Parts of Chemistry Benchmark
Original: Claude Opus 4.7 Beats NMR Software on Parts of Chemistry Benchmark View original →
Why the chemistry benchmark matters
NMR spectroscopy is one of synthetic chemistry's most repetitive bottlenecks. Researchers use spectra to verify molecular structures, then manually match peaks to atoms before they can trust that a compound is what they think it is. Anthropic used its official X account to point readers to a new science blog post testing Claude Opus 4.7 on that workflow.
"Opus 4.7 matches—and on some tasks beats—dedicated NMR software."
The tweet was posted on June 5, 2026 at 19:27 UTC and had more than 362,000 views and 3,300 likes when checked through FxTwitter. Anthropic's main account usually carries Claude product updates, safety research, interpretability work, and technical evaluations, so this post sits closer to a benchmark disclosure than a general marketing note.
The linked research page says Anthropic tested three Claude models, Opus 4.7, Opus 4.6, and Sonnet 4.6, against ChemDraw and MestReNova on 20 compounds selected from synthetic chemistry preprints published after the models' training cutoff. That design is meant to reduce the chance that the models had already seen the molecules. For hydrogen NMR, Opus 4.7 reached an average error of about plus or minus 0.079 ppm, less than half the tolerance window cited by Anthropic. For carbon NMR, Opus 4.7 and MestReNova were effectively tied, at plus or minus 1.37 ppm and 1.48 ppm respectively.
The result is more interesting because Anthropic also tried inverse structure elucidation. Classical software is strongest when a chemist provides a candidate structure and asks for a predicted spectrum. In lab work, the harder problem is often the reverse: start from spectra and infer the structure. Anthropic says Opus 4.7 solved all eight simpler inverse targets on every attempt from spectra and formula alone, and handled several harder targets when given starting-material context.
What to watch next is scale. The evaluation is small, with 20 forward-prediction compounds and 15 inverse problems, so it should not be read as a complete replacement for licensed chemistry tools. The next useful evidence would be blinded tests across more scaffolds, noisy real-world spectra, 2D NMR, and independent replication by working chemists.
Related Articles
Anthropic put hard numbers on Claude's biology capability claims instead of vague lab hype. In 99 real-data bioinformatics problems, the company says experts were stumped on 23 and recent Claude models solved roughly 30% of that hardest slice.
Anthropic said on March 23, 2026 that it is launching a Science Blog focused on how AI is changing research practice and scientific discovery. The new blog will publish feature stories, workflow guides, and field notes, while also highlighting Anthropic's broader AI-for-science programs.
Anthropic said on March 23, 2026 that not every long-horizon task benefits from splitting work across many agents, and pointed to a sequential setup for modeling the early universe. In the linked research post, Anthropic describes using Claude Opus 4.6 with persistent memory, orchestration patterns, and test oracles to implement a differentiable cosmological Boltzmann solver.