Claude Opus 4.7 Beats NMR Software on Parts of Chemistry Benchmark

Why the chemistry benchmark matters

NMR spectroscopy is one of synthetic chemistry's most repetitive bottlenecks. Researchers use spectra to verify molecular structures, then manually match peaks to atoms before they can trust that a compound is what they think it is. Anthropic used its official X account to point readers to a new science blog post testing Claude Opus 4.7 on that workflow.

"Opus 4.7 matches—and on some tasks beats—dedicated NMR software."

The tweet was posted on June 5, 2026 at 19:27 UTC and had more than 362,000 views and 3,300 likes when checked through FxTwitter. Anthropic's main account usually carries Claude product updates, safety research, interpretability work, and technical evaluations, so this post sits closer to a benchmark disclosure than a general marketing note.

The linked research page says Anthropic tested three Claude models, Opus 4.7, Opus 4.6, and Sonnet 4.6, against ChemDraw and MestReNova on 20 compounds selected from synthetic chemistry preprints published after the models' training cutoff. That design is meant to reduce the chance that the models had already seen the molecules. For hydrogen NMR, Opus 4.7 reached an average error of about plus or minus 0.079 ppm, less than half the tolerance window cited by Anthropic. For carbon NMR, Opus 4.7 and MestReNova were effectively tied, at plus or minus 1.37 ppm and 1.48 ppm respectively.

The result is more interesting because Anthropic also tried inverse structure elucidation. Classical software is strongest when a chemist provides a candidate structure and asks for a predicted spectrum. In lab work, the harder problem is often the reverse: start from spectra and infer the structure. Anthropic says Opus 4.7 solved all eight simpler inverse targets on every attempt from spectra and formula alone, and handled several harder targets when given starting-material context.

What to watch next is scale. The evaluation is small, with 20 forward-prediction compounds and 15 inverse problems, so it should not be read as a complete replacement for licensed chemistry tools. The next useful evidence would be blinded tests across more scaffolds, noisy real-world spectra, 2D NMR, and independent replication by working chemists.

Claude Opus 4.7 Beats NMR Software on Parts of Chemistry Benchmark

Why the chemistry benchmark matters

Related Articles

Anthropic pits Claude against 99 bio problems, clears 30% of expert stumpers

Anthropic launches a Science Blog to cover AI-driven research workflows and results

Anthropic shows how a single long-running Claude agent can tackle scientific computing