The fastest way to reduce your Claude API bill: stop using Opus for everything. Route simple tasks to Haiku (12x cheaper than Opus), enable prompt caching (90% off cached tokens), and set max_tokens on every request. Combined, these three changes cut most bills by 50-70% in an afternoon. If you want the biggest single lever — a discounted API gateway saves 80-85% instantly with one URL change, no code refactoring needed.
Here are the 7 tactics, ordered by ROI.
Three cost components per request:
Typical heavy user before optimization: 200K input + 15K output tokens per session, 20 sessions/day = ~$500/month. After the tactics below: same work, $80-130/month.
Most developers default Opus everywhere. The correct split:
| Task | Model | Why |
|---|---|---|
| Complex reasoning, architecture, planning | Opus 4.8 | Worth the cost here |
| Code writing, refactoring, analysis | Sonnet 4.6 | 5x cheaper, near-equal on code |
| Classification, extraction, file reads | Haiku 4.5 | 60x cheaper than Opus |
Routing 60% of requests to Haiku and 35% to Sonnet cuts a $500/month bill to roughly $90-120.
function selectModel(taskType) {
const simple = ['classify', 'extract', 'summarize', 'read']
const medium = ['code', 'analyze', 'write']
if (simple.includes(taskType)) return 'claude-haiku-4-5' // $0.80/M input
if (medium.includes(taskType)) return 'claude-sonnet-4-6' // $3/M input
return 'claude-opus-4-8' // $15/M — reserve for real reasoning
}
Anthropic’s prompt caching gives 90% off cached input tokens for 5 minutes. Any stable prefix over 1,024 tokens is a candidate: system prompts, tool definitions, long documents.
const response = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 500,
system: [
{
type: 'text',
text: yourLargeSystemPrompt,
cache_control: { type: 'ephemeral' }
}
],
messages: [{ role: 'user', content: userMessage }]
})
// First call: full price. Subsequent calls: 90% off the system prompt.
Rule: keep the cached prefix stable within sessions. Put volatile data in user messages, not the system prompt.
Output tokens cost 5x more than input. An unconstrained response producing 2,000 tokens when you needed 200 costs 10x more.
# Bad — unbounded
client.messages.create(model=model, messages=messages)
# Good — explicit ceiling
client.messages.create(model=model, max_tokens=300, messages=messages)
For classification: max_tokens=50. For code generation: max_tokens=800. For long-form: measure your p95 output length and add 20% buffer.
Every token in your system prompt gets billed on every request. A 1,500-line CLAUDE.md adds ~6K tokens per turn.
Target: 200-400 tokens. Cut: polite framing, generic best practices the model already knows, outdated decisions. Keep: stack with versions, active conventions (max 10), “never do” list (max 5).
Sessions accumulate stale tool results. After 90 minutes, 60% of context might be irrelevant file reads. Two fixes:
Tool use and JSON output produce 50-80% fewer output tokens than natural language for extraction tasks. A 200-token prose answer often compresses to 40 tokens as a JSON object.
# Instead of: "The sentiment is positive and the user seems satisfied..."
# Force structured output:
prompt = "Classify sentiment. Return JSON only: sentiment: positive|negative|neutral"
# Response: {"sentiment": "positive"} — 8 tokens vs 200+
All 6 tactics above require code changes. There is a faster lever: API gateways that provide the same Claude models at significantly lower prices. The only change is your base URL.
Aiapiflow provides access to Claude Opus 4.8, Sonnet 4.6, and Haiku 4.5 at up to 85% less than direct Anthropic pricing. Setup:
export ANTHROPIC_BASE_URL=https://aiapiflow.com
export ANTHROPIC_API_KEY=your-aiapiflow-key
Works with Claude Code, Cursor, Cline, any Anthropic SDK. Credits never expire, no subscription.
The math: a $500/month Anthropic bill becomes $75-100/month through a discounted gateway, before applying any of tactics 1-6. Combined, heavy users report reductions to $30-50/month.
| Metric | Unoptimized | After tactics 1-3 | After all 7 |
|---|---|---|---|
| Avg input tokens/session | 200K | 80K (50K cached) | 40K |
| Avg output tokens/session | 15K | 7K | 5K |
| Monthly cost (20 sessions/day) | ~$490 | ~$95 | ~$35 |
How much does Claude API cost per month for a heavy user?
Heavy Claude API use (20+ hours/week, full agent workflows) typically costs $250-500/month at official Anthropic pricing. A discounted gateway like Aiapiflow reduces these by 80-85% immediately.
What is the cheapest Claude model?
Claude Haiku 4.5 is the cheapest at $0.80/M input and $4/M output tokens. It handles classification, extraction, and file reads well. For code generation, Sonnet 4.6 ($3/$15 per M) gives better quality-to-cost. Use Opus ($15/$75) only for complex multi-step reasoning.
Does prompt caching work with Claude Code?
Yes. Claude Code uses prompt caching automatically on stable prefixes including CLAUDE.md and tool definitions. Keep CLAUDE.md stable within a session — editing it mid-task resets the cache.
Can I use a cheaper API without changing my code?
Yes. Set ANTHROPIC_BASE_URL to a discounted gateway endpoint and update your API key. The Anthropic SDK, Claude Code, Cursor, and Cline all respect these environment variables. Aiapiflow offers Claude Opus 4.8 at up to 85% less with no subscription.
One URL change. Claude Opus 4.8, Sonnet 4.6, Haiku at up to 85% less than direct pricing. Credits never expire, no subscription.
Create free account → Free account · Pay only when you top up · 5-min setup