Claude agents closed 186 office deals in Anthropic's market test
Original: Anthropic's Project Deal shows Claude agents can negotiate real office trades View original →
The useful part of Anthropic’s latest research is not that Claude chatted convincingly. It is that the company put agent behavior inside a real market and let it run. In a one-week internal experiment called Project Deal, Anthropic asked Claude-powered agents to buy, sell, and negotiate on behalf of employees in its San Francisco office. In the run that counted for actual exchanges, 69 agents closed 186 deals across more than 500 listed items, with total transaction value a little above $4,000.
“We created a marketplace for employees in our San Francisco office”
That line came from Anthropic’s source tweet, which linked to the full Project Deal write-up. The setup matters. Participants told Claude what they might want to buy or sell, and each agent received a $100 budget. The market itself ran in Slack, and once the experiment started there was no human sign-off on bids, counteroffers, or final agreements. The agents had to find matches, negotiate in natural language, and decide when to close.
Anthropic did not run the study only once. It operated four parallel versions of the marketplace. Two runs used Claude Opus 4.5 for every participant, while two other runs mixed in Claude Haiku 4.5 for half of the participants. Anthropic’s follow-up comments and the research page both point in the same direction: higher-quality models had a real advantage. That matters because it turns “better model” into something more concrete than a benchmark delta. In an agent economy, model quality can show up as better pricing, better deal selection, and higher completion rates.
The experiment also exposed rough edges rather than hiding them. Some agents tried to lowball. Some leaned too hard toward being liked. Others had to juggle user preferences against market incentives. Anthropic’s page argues that agent marketplaces could be useful, but also warns that once agents start operating in more adversarial settings, optimization pressure may produce stranger and less human-friendly behavior. That is a stronger takeaway than a simple product teaser because it shows both feasibility and the coordination problems that arrive with it.
The AnthropicAI account usually mixes Claude launches, safety policy, and research notes; this post sits firmly in the research bucket. What to watch next is whether Anthropic expands Project Deal into harsher environments, publishes more failure cases, or measures how often stronger agents dominate weaker ones. For now, the clearest signal is that delegated negotiation is no longer theoretical. The primary sources are the tweet and Anthropic’s full write-up.
Related Articles
HN interest centered less on “Claude finds bugs” and more on the shape of a harness security teams can adapt for their own targets.
Claude Fable 5 has moved to the top of Artificial Analysis’s GDPval-AA benchmark with a 1932 score. The result puts Anthropic models in three of the top four slots and raises the bar for long-running agentic knowledge work.
Anthropic’s May 29 platform notes move Claude Managed Agents deeper into AWS operations. Webhooks, multiagent orchestration, and self-hosted sandboxes are now available on Claude Platform on AWS, with new IAM actions and a managed policy for self-hosted execution.