DeepClaude: Run Claude Code's Agent Loop with DeepSeek V4 Pro at 17x Less Cost
Original: DeepClaude – Claude Code agent loop with DeepSeek V4 Pro View original →
The Concept
DeepClaude is an open-source tool that surgically replaces the AI brain inside Claude Code's agent loop while leaving the body intact. Everything developers rely on — file reading, editing, bash execution, subagent spawning, autonomous multi-step loops — continues to work as before. Only the model powering those decisions changes.
The project hit nearly 600 points on Hacker News and generated significant discussion around the Claude Code cost barrier.
Cost Breakdown
| Backend | Input/M | Output/M |
|---|---|---|
| DeepSeek (default) | $0.44 | $0.87 |
| OpenRouter | $0.44 | $0.87 |
| Fireworks AI | $1.74 | $3.48 |
| Anthropic (original) | $3.00 | $15.00 |
DeepSeek V4 Pro scores 96.4% on LiveCodeBench and includes automatic context caching that makes repeated turns up to 120x cheaper. For heavy coding workloads, the savings compound quickly.
How It Works
DeepClaude sets environment variables (ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, model name overrides) per-session without permanently altering your configuration. On exit, original settings are restored. A --switch flag lets you change backends mid-session without restarting.
Caveats
DeepSeek's servers are in China — for enterprise or sensitive environments, OpenRouter or Fireworks AI (US-based) are recommended alternatives. Model behavior will differ from native Claude, and some Anthropic-specific capabilities may not translate.
Related Articles
A high-traffic Hacker News thread pushed Alex Kim's Claude Code leak analysis into the center of the developer-tools conversation. The exposed source map turned vague concerns about anti-distillation, telemetry, and hidden behavior into named flags and inspectable code paths.
A Hacker News thread highlighted Context Mode, an MCP server that reports reducing Claude Code tool-output context usage from 315 KB to 5.4 KB in tested workflows.
The GitHub project Caveman claims it can cut output tokens by about 75% by stripping filler language while preserving code and technical terms. On Hacker News, developers are treating it as a serious experiment in reducing agent cost, latency, and verbosity.