Claude Code Token Usage: What It Actually Costs Per Session

June 20, 2026 · Aiapiflow

A typical Claude Code session with Sonnet 4.6 costs $0.40-1.20 depending on what you are doing. A heavy refactor session touching many files can hit $3-5. At 20 sessions per day, that is $240-$600/month before any optimization. The main drivers: context accumulates fast, CLAUDE.md loads on every session, and output tokens for code are expensive at $15/M. Here is what actually moves the needle.

How Claude Code bills you

Every Claude Code turn has three cost components:

Input tokens — your message + system prompt + CLAUDE.md + all prior turns + tool results from file reads. Sonnet 4.6: $3/M uncached, $0.30/M cached (90% discount).
Output tokens — code Claude writes back. $15/M. No caching discount. This is the expensive part.
Tool execution — reading a file is free, but the file contents get injected into the next turn as input tokens.

The cost spiral: you ask Claude to read 10 files, each 200 lines. That is ~15,000 tokens of tool results loaded into every subsequent turn for the rest of the session. 20 more turns later, you are paying for those 10 files 20 times even though you only needed them once.

Real cost breakdown per session type

Session type	Approx input tokens	Approx output tokens	Sonnet 4.6 cost
Quick bug fix (5-10 turns)	20K	3K	~$0.10
Feature implementation (15-20 turns)	80K	8K	~$0.36
Large refactor (30+ turns, many files)	250K	20K	~$1.05
Codebase exploration + planning	400K	15K	~$1.43
Full-day heavy use (100+ turns)	800K-1.5M	60K-100K	$3.30-$6

These use official Anthropic pricing. Numbers drop 80-85% with a discounted gateway.

What drives costs up unexpectedly

CLAUDE.md size

A 1,500-line CLAUDE.md adds ~6,000 tokens to every session. Even with prompt caching, the first turn of every new session pays full price. Trim to 200-400 lines. Cut generic best practices, outdated decisions, and anything the model already knows.

Staying in a session too long

Context accumulates. After 90 minutes on a complex feature, 60% of your context might be stale file reads from 30 turns ago. Each new turn pays for all of it. /clear between unrelated tasks is free — use it.

Sub-agents on expensive models

If Claude Code spawns sub-agents and defaults them to Sonnet or Opus, costs multiply. A 5-agent fan-out on Sonnet for simple file reading is 5x the cost it should be. Set sub-agent model to Haiku in .claude/settings.json.

No max_tokens discipline

Long, unbounded responses where a short one would do. A 1,500-token code comment block when 200 tokens was sufficient costs $0.019 vs $0.003. At scale it compounds.

How to cut your Claude Code bill

1. Change one environment variable (fastest, biggest impact)

Discounted API gateways offer the same Claude models at 80-85% less than direct Anthropic pricing. Two lines:

export ANTHROPIC_BASE_URL=https://aiapiflow.com
export ANTHROPIC_API_KEY=your-aiapiflow-key

Claude Code reads both variables automatically. No other changes. A $400/month bill becomes $60-80/month before any further optimization. Aiapiflow gives you Claude Opus 4.8, Sonnet 4.6, and Haiku 4.5 with pay-as-you-go credits that never expire.

2. Route sub-agents to Haiku

In .claude/settings.json:

{
  "defaultModel": "claude-sonnet-4-6",
  "subAgentModel": "claude-haiku-4-5"
}

Sub-agents do the reading, exploration, and classification. Haiku handles all of that at 60x less than Opus. This alone cuts multi-agent session costs by 40-60%.

3. Keep CLAUDE.md lean and stable

Target 200-400 lines. Prompt caching only kicks in when the prefix is identical — editing CLAUDE.md mid-session resets the cache. Stable CLAUDE.md + caching = 90% off on the system prompt portion of every cached turn.

4. Use /clear between unrelated tasks

Free. Instant. Clears accumulated context before starting a new task. The most underused cost control in Claude Code.

5. Sub-agents for codebase exploration

Instead of reading 10 files in the main context (which stays there forever), spawn a sub-agent to read and summarize. The sub-agent returns a 300-word summary. Your main context stays lean. Results in 40% lower per-turn input costs on long sessions.

Combined savings example

Scenario	Monthly cost
Unoptimized, direct Anthropic, 20 sessions/day	~$420
After discounted gateway (85% off)	~$63
After gateway + model routing + /clear discipline	~$28
After all tactics including lean CLAUDE.md	~$18-22

Frequently asked questions

How many tokens does a typical Claude Code session use?

A medium session (15-20 turns, one feature) uses roughly 80K input and 8K output tokens with Sonnet 4.6, costing about $0.36 at official Anthropic pricing. Heavy sessions (30+ turns, large refactors) use 200-400K input tokens and cost $0.80-1.50. Full-day heavy use runs $3-6/day.

Does Claude Code use prompt caching automatically?

Yes. Claude Code caches stable prefixes including CLAUDE.md and tool definitions automatically. The cache gives a 90% discount on cached input tokens for 5 minutes. Keep CLAUDE.md stable within a session to maximize cache hits.

Can I use Claude Code with a cheaper API endpoint?

Yes. Set ANTHROPIC_BASE_URL to a discounted gateway and ANTHROPIC_API_KEY to your gateway key. Claude Code uses both environment variables automatically. Aiapiflow provides Claude Opus 4.8, Sonnet 4.6, and Haiku 4.5 at up to 85% less than direct Anthropic pricing with no subscription.

What is the cheapest Claude model that works well for coding?

Claude Sonnet 4.6 is the best quality-to-cost for code generation at $3/M input and $15/M output (official pricing). Claude Haiku 4.5 ($0.80/$4 per M) works well for file reads and classification but produces weaker code on complex tasks. Use Haiku for sub-agents and exploration, Sonnet for actual code writing.

Cut your Claude API bill instantly

One URL change. Claude Opus 4.8, Sonnet 4.6, Haiku at up to 85% less than direct pricing. Credits never expire, no subscription.

Create free account → Free account · Pay only when you top up · 5-min setup