AI Coding Costs (2026): Claude vs Codex vs Gemini, Real Monthly Spend From Token Math ($20 to $1,000+)

The pricing page says $20/month. The invoice says something else, because coding agents bill by the token and a single task pushes 400K to 2M cumulative input tokens through the API. This page computes the real numbers at published June 2026 rates: cost per token for every major coding model, cost per task, the volume where a $100 Max plan beats API billing, the June 15 Anthropic billing change, and the Microsoft cancellation that put a hard dollar figure ($2,000/engineer/month) on enterprise agent spend.

179x

Output price spread: DeepSeek V4 Flash $0.28/M vs Claude Fable 5 $50/M

$0.54

One bug-fix task on Sonnet 4.6 at API rates (token math below)

111 tasks/mo

Where Claude Max 5x ($100) breaks even vs Opus 4.8 API billing

$2,000/mo

Reported per-engineer Claude Code spend that made Microsoft cancel

Cost Per Token: Every Major Coding Model (June 2026)

Vendors publish prices per million tokens (MTok). One table, all current official rates, sourced from Anthropic, OpenAI, Google, DeepSeek, Moonshot, Z.AI, MiniMax, and Alibaba Cloud pricing docs as of June 9, 2026:

Model	Input /MTok	Output /MTok	Cache read /MTok
Claude Fable 5	$10.00	$50.00	$1.00
Claude Opus 4.8	$5.00	$25.00	$0.50
GPT-5.5	$5.00	$30.00	$0.50
Claude Sonnet 4.6	$3.00	$15.00	$0.30
GPT-5.4	$2.50	$15.00	$0.25
Gemini 3.1 Pro (≤200K prompt)	$2.00	$12.00	$0.20
GPT-5.3-Codex	$1.75	$14.00	$0.175
Gemini 3.5 Flash	$1.50	$9.00	$0.15
GLM-5.1	$1.40	$4.40	$0.26
Claude Haiku 4.5	$1.00	$5.00	$0.10
Kimi K2.6	$0.95	$4.00	$0.16
GPT-5.4-mini	$0.75	$4.50	$0.075
Gemini 3 Flash (preview)	$0.50	$3.00	n/a
DeepSeek V4 Pro	$0.435	$0.87	$0.0036
Qwen3.5-Plus (≤256K prompt)	$0.40	$2.40	n/a
MiniMax M3 (≤512K prompt)	$0.30	$1.20	$0.06
DeepSeek V4 Flash	$0.14	$0.28	$0.0028

Three pricing quirks matter for agents. Gemini 3.1 Pro doubles its input rate above 200K-token prompts ($2 to $4 input, $12 to $18 output). Qwen3.5-Plus steps up above 256K ($0.40/$2.40 to $0.50/$3.00) and MiniMax M3 above 512K ($0.30/$1.20 to $0.60/$2.40). Anthropic's 1M-context models (Fable 5, Opus 4.8/4.7/4.6, Sonnet 4.6) have no long-context surcharge: a 900K-token request bills at the same per-token rate as a 9K one. Full Claude-only detail lives on Anthropic API pricing.

Context windows: Claude Fable 5, Opus 4.8, Sonnet 4.6, GPT-5.5, GPT-5.4, Gemini 3.1 Pro, DeepSeek V4, MiniMax M3, and Qwen3.5-Plus all run 1M tokens. GPT-5.4-mini is 400K, Kimi K2.6 is 262K, Claude Haiku 4.5 is 200K.

Price does not track capability linearly. On SWE-bench Verified (llm-stats, June 2026): Claude Fable 5 scores 95.0% at $50/M output, Opus 4.8 scores 88.6% at $25/M, while DeepSeek V4 Pro Max scores 80.6% at $0.87/M output and MiniMax M3 scores 80.5% at $1.20/M. The last 7-15 points of benchmark performance cost 20-50x more per output token. See DeepSeek V4 for the open-weights side of that tradeoff.

One caveat on the cheap open-weights rates: most serverless DeepSeek providers quantize activations to fp8 to cut cost, which degrades output below the reference weights. Morph serves DeepSeek with full 16-bit (bf16) activations, so output matches the published model, and runs codegen-specific speculative decoding (draft and ngram tuned on code) plus custom inference kernels built for code generation. That makes it the cheapest full-fidelity way to run DeepSeek for coding agents: morph-dsv4flash bills $0.139/M input and $0.278/M output. Rates and tiers on pricing.

What One Coding Task Costs

Per-MTok prices only become real when multiplied by an agent workload. Two representative profiles, applied identically to every model:

Bug fix: 25 API calls, 400K cumulative input tokens (75% served from cache), 10K output tokens.
Feature: 100 API calls, 2M cumulative input tokens (80% cached), 40K output tokens.

These are token-math projections at published rates, not measured sessions. Your profile shifts with codebase size and prompt discipline, but the model-to-model ratios hold because they come from the price sheet.

Model	Bug fix	Feature	20 features/month
DeepSeek V4 Pro	$0.05	$0.21	$4.20
MiniMax M3	$0.06	$0.26	$5.20
Kimi K2.6	$0.18	$0.80	$16.00
GLM-5.1	$0.26	$1.15	$23.00
GPT-5.3-Codex	$0.37	$1.54	$30.80
Gemini 3.1 Pro	$0.38	$1.60	$32.00
Claude Sonnet 4.6	$0.54	$2.28	$45.60
Claude Opus 4.8	$0.90	$3.80	$76.00
GPT-5.5	$0.95	$4.00	$80.00
Claude Fable 5	$1.80	$7.60	$152.00

Worked example for Sonnet 4.6 on the bug-fix profile: 100K uncached input × $3/M = $0.30, plus 300K cache reads × $0.30/M = $0.09, plus 10K output × $15/M = $0.15. Total $0.54. Without prompt caching the same task costs $1.35, a 2.5x difference from one config flag. Gemini's figure excludes its cache storage fee ($4.50 per 1M tokens per hour on 3.1 Pro), which adds cost on long-lived sessions.

The other cost driver is context growth. Every API call resends conversation history, so cumulative input scales with both task length and how much noise (file dumps, failed attempts, verbose tool output) accumulates. Context rot compounds it: quality drops as context grows, failures add more context, the next call costs more. Plug your own token counts into the LLM cost calculator.

Claude Pro, Max, and Claude Code in 2026

The question search data shows people asking most: what does the Claude Pro plan cost per month in 2026, and is Claude Code included? Answer: Claude Pro is $20/month billed monthly, $17/month billed annually, and yes, Claude Code is included on Pro, Max 5x ($100/month), and Max 20x ($200/month). Claude Code usage draws from the same 5-hour rolling usage windows as chat; Max 5x and 20x multiply those limits 5x and 20x.

Plan	Price/month	What you get
Claude Pro	$20 ($17 annual)	Claude Code + chat, base 5-hour-window limits, $20/mo Agent SDK credit pool from June 15
Claude Max 5x	$100	5x Pro limits, $100/mo Agent SDK credit pool
Claude Max 20x	$200	20x Pro limits, $200/mo Agent SDK credit pool
Codex Free / Go	$0 / $8	Codex CLI included with ChatGPT sign-in, lowest usage limits
Codex Plus	$20	15-80 GPT-5.5 messages per 5-hour window (20-100 on GPT-5.4), web + CLI + IDE
Codex Pro	from $100	5x or 20x rate-limit options, Codex-Spark research preview
GitHub Copilot Enterprise	$39/seat	Flat rate, no usage surcharge (the plan Microsoft moved engineers to)

Codex also meters by credits: GPT-5.5 burns 125 credits per 1M input tokens, 12.5 per 1M cached, 750 per 1M output, and a message averages 5-45 credits; Plus and Pro users can buy extra credits when limits hit. Plan-level detail: Claude Code pricing, Codex pricing, Claude Code usage limits, and Claude Code team pricing.

Subscription vs API: The Breakeven Math

Reddit is full of claims that Claude subscriptions are 25-36x cheaper than API billing. The claim is computable, so compute it. Using the bug-fix profile above ($0.90 per task on Opus 4.8 at API rates):

Max 5x ($100/month) breakeven: $100 / $0.90 = 111 bug-fix tasks per month, about 5 per working day. Or $100 / $3.80 = 26 feature-sized tasks.
Max 20x ($200/month) breakeven: 222 bug fixes or 52 features per month.
Pro ($20/month) breakeven: 22 bug fixes per month on Opus, or 37 on Sonnet 4.6 ($0.54/task). One task per working day already beats API billing.

A 36x subsidy means extracting $3,600 of API-equivalent usage from a $100 plan: 4,000 bug-fix-sized tasks per month, or 130 per day. No human does that interactively. That volume is cron jobs, CI pipelines, and Agent SDK fleets running under a flat-rate login. Which is exactly the usage Anthropic carved out on June 15.

The breakeven rule of thumb

If you run more than ~1 agent task per working day, a subscription beats API billing for interactive use. If you run more than ~5 per day, Max 5x beats Pro. API billing only wins when usage is sporadic, when you need a cheaper model family (DeepSeek V4 at $0.21/feature has no subscription to beat), or when usage is automation, which no longer rides on flat-rate plans anyway.

The June 15, 2026 Anthropic Billing Change

Announced in mid-May 2026 and effective June 15, Anthropic split subscription usage into two pools. It follows the April 2026 restriction on third-party tools consuming flat-rate plans, and it is Anthropic's answer to the breakeven math above: agents were extracting 12-175x effective subsidies on flat-rate logins.

Moves to a metered credit pool, billed at standard API token rates:

Claude Agent SDK usage (Python and TypeScript)
claude -p headless mode (scripts, cron jobs)
The Claude Code GitHub Actions integration
Third-party apps authenticating with a Claude subscription via the Agent SDK

Stays on normal subscription limits: chat on web, desktop, and mobile; interactive Claude Code in the terminal or an IDE; Claude Cowork.

The monthly credit allotments: Pro $20, Max 5x $100, Max 20x $200. Credits are per-user (not pooled across a team), refresh with the billing cycle, and do not roll over. When credits run out, automation stops unless overflow billing to a payment method is enabled. Practical translation: a Max 20x subscriber gets $200/month of API-rate automation, which at Opus 4.8 rates is roughly 222 bug-fix-sized headless runs. Pipelines beyond that should move to a straight API key, the Batch API at 50% off, or a cheaper model tier.

Why Microsoft Canceled Claude Code Licenses

In early June 2026, Windows Central reported that Microsoft's Experiences + Devices division (Windows, Microsoft 365, Outlook, Teams, Surface) told thousands of engineers to stop using Claude Code by June 30, 2026 and move to GitHub Copilot CLI. The stated driver was operational cost: token-based billing reportedly reached around $2,000 per engineer per month and burned through the division's annual AI budget in months. June 30 is also Microsoft's fiscal year end, and Copilot is Microsoft's own product, so the move cut costs and consolidated on in-house tooling at once.

The economics generalize beyond Microsoft. Enterprise Claude Code is a seat fee plus actual token usage, so spend scales with how useful the tool is: better agent, more usage, bigger bill. Copilot Enterprise is $39/seat/month flat. At $2,000/engineer/month in token billing versus $39 flat, the CFO math is 51x, regardless of which tool engineers prefer. The unresolved question is output: a flat-rate tool that caps usage also caps the work the agent does. Teams making this decision should compute cost per completed task (the table above), not cost per seat.

Cache Pricing and the Tokenizer Change

Prompt caching is the single largest lever on effective Claude costs because agent calls are mostly repeated prefix. Anthropic's multipliers on base input price: 5-minute cache write 1.25x, 1-hour cache write 2x, cache read 0.1x. In dollars on Fable 5: $12.50/M for a 5-minute write, $20/M for a 1-hour write, $1/M per read against a $10/M base.

Model	Input	5-min cache write	Cache read	Batch (in/out)
Claude Fable 5	$10.00	$12.50	$1.00	$5.00 / $25.00
Claude Opus 4.8	$5.00	$6.25	$0.50	$2.50 / $12.50
Claude Sonnet 4.6	$3.00	$3.75	$0.30	$1.50 / $7.50
Claude Haiku 4.5	$1.00	$1.25	$0.10	$0.50 / $2.50

Three more line items people miss. The Batch API is a flat 50% off input and output for work that tolerates async processing. Setting inference_geo: "us" (Opus 4.6+ and Sonnet 4.6+) adds a 1.1x multiplier on every token category. The web search tool bills $10 per 1,000 searches on top of tokens.

And the tokenizer: Opus 4.7 and later, including Fable 5, use a new tokenizer that can produce up to 35% more tokens for the same text than pre-4.7 models. That is per Anthropic's own pricing docs. A "same price" model that tokenizes your codebase into 35% more tokens is up to 35% more expensive per request, which is the mechanism behind the cost-hike complaints that circulated after the 4.7 release.

Monthly Spend Scenarios

Putting plans, token math, and breakevens together. Task counts use the bug-fix and feature profiles defined above; API-equivalent figures show what the same usage would bill at standard rates.

Profile	Recommended setup	Monthly cost	API-equivalent
Solo, light (2-3 tasks/day)	Claude Pro or Codex Plus	$17-20	~$30-50 at Sonnet 4.6 rates
Solo, budget	Codex Go, or DeepSeek V4 API direct	$3-15	$3-12 at DeepSeek V4 Pro rates
Solo, heavy (8-12 tasks/day)	Claude Max 5x or Codex Pro	$100	~$200-450 at Opus 4.8 rates
Solo, heavy + automation	Claude Max 20x + API key for overflow	$200-400	$600-1,200+
5-person team, Claude-first	5x Max 5x	$500	$1,000-2,250
5-person team, flat-rate	Copilot Enterprise (Microsoft's pick)	$195	n/a (capped usage)
5-person team, open-weight API	DeepSeek V4 Pro / MiniMax M3 via API	$40-160	that is the API bill

The $1,000+/month profiles are real but rare for individuals: they require Fable 5-class models ($7.60/feature) plus heavy automation, or unoptimized agents resending 200K-token contexts without caching. For most solo developers the honest 2026 answer is $20-100/month on a subscription; for teams it is $40-500/month per 5 engineers depending on model tier, with the open-weight API option an order of magnitude below the frontier-subscription option at a 7-15 point SWE-bench cost.

How to Cut the Bill

Four levers, in order of impact. Each works alone; routing plus compaction typically cuts API spend 40-70%.

1. Prompt caching: pay 10% on repeated prefix

The bug-fix math above showed it: $0.54 with caching versus $1.35 without on Sonnet 4.6. Claude Code enables caching automatically. If you run a custom harness against the API, structure prompts so the system prompt, CLAUDE.md, and file context sit in a stable prefix, and mark cache breakpoints. This is the cheapest 60% discount in the industry.

2. Model routing: match model tier to task difficulty

Haiku 4.5 costs 5x less than Opus 4.8 on both input and output. Typo fixes, imports, docstrings, and formatting do not need a frontier model. Model routing classifies each request and sends it to the cheapest capable tier; Morph Router does the classification in ~430ms at $0.001 per request across Anthropic, OpenAI, and Google model families. If half your tasks are Haiku-tier, routing alone roughly halves model spend. See Sonnet vs Haiku for where the capability line actually sits.

For multi-agent workloads, routing also applies at the architectural level: planner on Opus 4.8, executor on Sonnet 4.6 or Haiku 4.5. A typical build uses 4M tokens on planning and 10M on execution — routing the executor to Haiku saves 57% on the execution-side spend with no change to plan quality. Morph benchmarked 44 planner-executor pairs on 40 real builds: full leaderboard.

3. Context compaction: stop resending noise

Cumulative input is the dominant line item in both task profiles (74% of the Sonnet bug-fix cost is input, not output). Most of that context is stale: failed attempts, verbose tool output, file dumps from 40 turns ago. Morph Compact deletes irrelevant lines verbatim (no summarization, no rewriting) at ~33,000 tok/s, typically shrinking context 50-70%. Shrinking the 2M-token feature profile to 800K cuts the Sonnet cost from $2.28 toward $1.10 and reduces context-rot failures at the same time.

4. Batch API and scoped prompts

Anthropic's Batch API is a flat 50% off ($2.50/$12.50 on Opus 4.8, $1.50/$7.50 on Sonnet 4.6) for async work: code review, test generation, bulk refactors. And prompts that name the file ("fix the JWT check in src/auth/middleware.ts") skip the exploration turns that "fix the auth error" pays for. Full treatment of every lever: LLM cost optimization.

Audit the subscription stack

Cursor includes Claude and GPT access; Copilot and Cursor overlap on completions; Codex CLI is free with any ChatGPT login. Cancel duplicate coverage before optimizing tokens.

Measure cost per task, not per seat

Microsoft's $2,000/engineer/month number was unbounded usage on a frontier model. A team tracking $/completed-task can route 80% of tasks to models that cost 10-50x less.

Set a June 15 credit alarm

If you run claude -p in CI or cron, your automation now stops when the monthly credit pool ($20/$100/$200) empties, unless overflow billing is on. Budget it explicitly.

Frequently Asked Questions

How much is the Anthropic Claude Pro plan per month in 2026, and is Claude Code included?

$20/month billed monthly, $17/month billed annually. Claude Code is included on Pro and on both Max tiers (Max 5x $100/month, Max 20x $200/month), drawing from the same 5-hour usage windows as chat. From June 15, 2026, Agent SDK and headless claude -p usage bills against a separate API-rate credit pool: $20/month of credits on Pro, $100 on Max 5x, $200 on Max 20x.

What is Claude's cost per token?

Per million tokens, input/output: Haiku 4.5 $1/$5, Sonnet 4.6 $3/$15, Opus 4.8 $5/$25, Fable 5 $10/$50. Per individual token on Sonnet 4.6 that is $0.000003 in, $0.000015 out. Cache reads are 10% of input price, batch is 50% of standard, and the 1M-context models carry no long-context surcharge.

How much does one Claude Code task cost in API terms?

With a typical agent profile (25 calls, 400K cumulative input at 75% cache hits, 10K output), a bug fix computes to $0.54 on Sonnet 4.6, $0.90 on Opus 4.8, $1.80 on Fable 5. A feature-sized task (100 calls, 2M input, 40K output) computes to $2.28, $3.80, and $7.60. The same feature is $0.21 on DeepSeek V4 Pro and $1.54 on GPT-5.3-Codex.

Did Microsoft really terminate Claude Code licenses over costs?

Yes. Windows Central reported in June 2026 that Microsoft's Experiences + Devices division ordered engineers off Claude Code by June 30, 2026, citing token billing that reportedly hit ~$2,000 per engineer per month and exhausted the annual AI budget early. Engineers were moved to GitHub Copilot CLI; Copilot Enterprise bills a flat $39/seat/month.

Is Claude Max worth it compared to API billing?

For interactive use, almost always past light usage. Max 5x ($100) breaks even against Opus 4.8 API rates at ~111 bug-fix tasks or ~26 features per month, about 5 tasks per working day. The 25-36x subsidy stories require automation-scale volume, which moved to metered credit pools on June 15, 2026, so the subsidy now caps at the credit allotment.

What changes in Claude billing on June 15, 2026?

Agent SDK usage, claude -p headless mode, the Claude Code GitHub Actions integration, and third-party apps on subscription auth move to a separate monthly credit pool billed at standard API token rates ($20 Pro / $100 Max 5x / $200 Max 20x, per user, no rollover, automation halts at zero unless overflow billing is on). Interactive Claude Code and normal chat are unchanged.

What is the cheapest model that can actually code?

DeepSeek V4 Pro Max scores 80.6% on SWE-bench Verified, tied with Gemini 3.1 Pro, while DeepSeek V4 Pro bills $0.435/$0.87 per MTok and V4 Flash bills $0.14/$0.28. MiniMax M3 scores 80.5% at $0.30/$1.20. Claude Opus 4.8 (88.6%) and Fable 5 (95.0%) are stronger but cost 20-50x more per output token.

Related Resources

Cut AI Coding Costs 40-70%

Morph Router picks the right model for each task ($0.001/request, 430ms). Morph Compact deletes stale context verbatim at ~33,000 tok/s, typically shrinking it 50-70%. Combined: most teams cut API spend 40-70% without changing what the agent produces.

Try Morph Router

Try Morph Compact

Fast Apply

WarpGrep

Compact

Model Router

DeepSeek

MiniMax

Qwen

Blog

Startup Credits

Students

Contact Us

About

Careers

AI Coding Costs (2026): Claude vs Codex vs Gemini, Real Monthly Spend From Token Math