A typical Claude Code session with Sonnet 4.6 costs $0.40-1.20 depending on what you are doing. A heavy refactor session touching many files can hit $3-5. At 20 sessions per day, that is $240-$600/month before any optimization. The main drivers: context accumulates fast, CLAUDE.md loads on every session, and output tokens for code are expensive at $15/M. Here is what actually moves the needle.
Every Claude Code turn has three cost components:
The cost spiral: you ask Claude to read 10 files, each 200 lines. That is ~15,000 tokens of tool results loaded into every subsequent turn for the rest of the session. 20 more turns later, you are paying for those 10 files 20 times even though you only needed them once.
| Session type | Approx input tokens | Approx output tokens | Sonnet 4.6 cost |
|---|---|---|---|
| Quick bug fix (5-10 turns) | 20K | 3K | ~$0.10 |
| Feature implementation (15-20 turns) | 80K | 8K | ~$0.36 |
| Large refactor (30+ turns, many files) | 250K | 20K | ~$1.05 |
| Codebase exploration + planning | 400K | 15K | ~$1.43 |
| Full-day heavy use (100+ turns) | 800K-1.5M | 60K-100K | $3.30-$6 |
These use official Anthropic pricing. Numbers drop 80-85% with a discounted gateway.
A 1,500-line CLAUDE.md adds ~6,000 tokens to every session. Even with prompt caching, the first turn of every new session pays full price. Trim to 200-400 lines. Cut generic best practices, outdated decisions, and anything the model already knows.
Context accumulates. After 90 minutes on a complex feature, 60% of your context might be stale file reads from 30 turns ago. Each new turn pays for all of it. /clear between unrelated tasks is free — use it.
If Claude Code spawns sub-agents and defaults them to Sonnet or Opus, costs multiply. A 5-agent fan-out on Sonnet for simple file reading is 5x the cost it should be. Set sub-agent model to Haiku in .claude/settings.json.
Long, unbounded responses where a short one would do. A 1,500-token code comment block when 200 tokens was sufficient costs $0.019 vs $0.003. At scale it compounds.
Discounted API gateways offer the same Claude models at 80-85% less than direct Anthropic pricing. Two lines:
export ANTHROPIC_BASE_URL=https://aiapiflow.com
export ANTHROPIC_API_KEY=your-aiapiflow-key
Claude Code reads both variables automatically. No other changes. A $400/month bill becomes $60-80/month before any further optimization. Aiapiflow gives you Claude Opus 4.8, Sonnet 4.6, and Haiku 4.5 with pay-as-you-go credits that never expire.
In .claude/settings.json:
{
"defaultModel": "claude-sonnet-4-6",
"subAgentModel": "claude-haiku-4-5"
}
Sub-agents do the reading, exploration, and classification. Haiku handles all of that at 60x less than Opus. This alone cuts multi-agent session costs by 40-60%.
Target 200-400 lines. Prompt caching only kicks in when the prefix is identical — editing CLAUDE.md mid-session resets the cache. Stable CLAUDE.md + caching = 90% off on the system prompt portion of every cached turn.
Free. Instant. Clears accumulated context before starting a new task. The most underused cost control in Claude Code.
Instead of reading 10 files in the main context (which stays there forever), spawn a sub-agent to read and summarize. The sub-agent returns a 300-word summary. Your main context stays lean. Results in 40% lower per-turn input costs on long sessions.
| Scenario | Monthly cost |
|---|---|
| Unoptimized, direct Anthropic, 20 sessions/day | ~$420 |
| After discounted gateway (85% off) | ~$63 |
| After gateway + model routing + /clear discipline | ~$28 |
| After all tactics including lean CLAUDE.md | ~$18-22 |
How many tokens does a typical Claude Code session use?
A medium session (15-20 turns, one feature) uses roughly 80K input and 8K output tokens with Sonnet 4.6, costing about $0.36 at official Anthropic pricing. Heavy sessions (30+ turns, large refactors) use 200-400K input tokens and cost $0.80-1.50. Full-day heavy use runs $3-6/day.
Does Claude Code use prompt caching automatically?
Yes. Claude Code caches stable prefixes including CLAUDE.md and tool definitions automatically. The cache gives a 90% discount on cached input tokens for 5 minutes. Keep CLAUDE.md stable within a session to maximize cache hits.
Can I use Claude Code with a cheaper API endpoint?
Yes. Set ANTHROPIC_BASE_URL to a discounted gateway and ANTHROPIC_API_KEY to your gateway key. Claude Code uses both environment variables automatically. Aiapiflow provides Claude Opus 4.8, Sonnet 4.6, and Haiku 4.5 at up to 85% less than direct Anthropic pricing with no subscription.
What is the cheapest Claude model that works well for coding?
Claude Sonnet 4.6 is the best quality-to-cost for code generation at $3/M input and $15/M output (official pricing). Claude Haiku 4.5 ($0.80/$4 per M) works well for file reads and classification but produces weaker code on complex tasks. Use Haiku for sub-agents and exploration, Sonnet for actual code writing.
One URL change. Claude Opus 4.8, Sonnet 4.6, Haiku at up to 85% less than direct pricing. Credits never expire, no subscription.
Create free account → Free account · Pay only when you top up · 5-min setup