Claude Code Context Window: Optimize Your Token Usage

Master Claude Code's context window for larger projects. Token optimization, /compact, task chunking, and recovery techniques.

Problem: Claude Code loses track of your project when conversations get too long. You end up re-explaining everything from scratch, or worse, Claude produces code that contradicts decisions made twenty minutes ago.

Quick Win: Watch the token percentage in Claude Code's status bar. When you hit 80%, exit the session and restart with claude for complex multi-file work:

# Exit current session (Ctrl+C or type 'exit')
# Then start fresh
claude

This single habit prevents the performance degradation that kills productivity on large projects. The rest of this guide covers why it works and five more strategies that compound on top of it.

Before the strategies, watch what you're actually managing. Press play to run a simulated session: what loads before you type, what each file read costs, and where the reserved auto-compact buffer sits.

Inside the context window

One session, from `claude` to `/compact`, and what every step costs.

~0/ 200K

0.0% used · illustrative

0↑ auto-compact fires at ~167K (83.5%)200K

$ claude

Watch what loads into context from the moment you run claude, through a full debugging session, to /compact.

Tap an event or a color

Tap any row, capacity-bar segment, or legend color to inspect it. Tap again to clear.

Boot · takeaway

A lot loads before you type a word: CLAUDE.md, memory, skills, MCP, and the system prompt are all in context already.

What is in the window

Press play to watch the window fill.

The Mental Model: Active Working Memory

Claude Code's context window isn't passive storage. It's active working memory. As it fills up, Claude struggles to maintain awareness across your entire project, the same way you'd struggle to hold a 50-page document in your head while writing code.

The practical consequence: context depletion causes four predictable failures.

Inconsistent code that conflicts with earlier work in the same session
Repeated questions about project structure you already explained
Lost architectural decisions and naming conventions
Breaking changes that ignore patterns established minutes ago

Understanding when these failures start matters more than trying to prevent them entirely.

The 80/20 Rule

Never use the final 20% of your context window for complex, multi-file tasks. Memory-intensive operations like refactoring, feature implementation, and debugging require substantial working memory to track relationships between components.

High Context Tasks (stop at 80% capacity):

Large-scale refactoring across multiple files
Feature implementation spanning several components
Complex debugging requiring architectural understanding
Code reviews with cross-file dependencies

Low Context Tasks (safe to continue past 80%):

Single-file edits with clear scope
Independent utility function creation
Documentation updates
Simple, localized bug fixes

The insight from experienced users: isolate your complex work to the first 80% of a session. Save the tail end for lightweight tasks that don't need Claude to hold your entire architecture in mind.

/compact and When to Use It

The /compact slash command summarizes and compresses your conversation history. On the first-party Anthropic API, this runs fast because Claude Code's Session Memory writes a running summary to disk in the background, so /compact loads that pre-written summary instead of re-analyzing the whole conversation from scratch. This is observed behavior rather than something Anthropic's official context window documentation describes, and it is feature-flag gated: Bedrock, Vertex, and Foundry users don't get it.

That background summary captures your session title and status, key results and decisions, open questions, and a running work log. It lives at ~/.claude/projects/<project-hash>/<session-id>/session-memory/summary.md, one summary file per session. Our Session Memory guide covers the extraction cadence and availability in full.

When to compact manually:

After completing a major feature (before starting the next)
Before switching from research mode to implementation mode
When Claude starts repeating questions or contradicting earlier decisions
At natural breakpoints between unrelated tasks

When NOT to compact:

Mid-debugging, when specific error messages and stack traces matter
During complex refactoring where file-level details are critical
Right before integration work that depends on context from component building

When you do compact, you can steer what the summary keeps. Run /compact focus on the auth refactor, or any instruction, and the summary preserves what you name instead of what the automatic pass guesses is important. Anthropic documents this behavior in the official context window reference. That one habit is the difference between losing the thread of your current task and carrying it cleanly into the next stretch of work.

The goal: provide minimum context necessary for effective task execution. This maximizes performance, token efficiency, and cost simultaneously. For a complete reference of /compact, /clear, and every other slash command in the interactive interface, see the interactive mode guide.

For a deep dive into the compaction buffer mechanics, how the 33K token reservation works, and how to override the trigger threshold, see the context buffer management guide.

What Survives Compaction

Whether you run /compact or it fires automatically, compaction summarizes your conversation history to fit the window. What happens to your instructions depends entirely on how they were loaded, and Anthropic documented the exact rules. They are not intuitive.

The short version: anything loaded from disk at startup comes back, and anything that arrived through the conversation gets summarized away. Your project-root CLAUDE.md, your unscoped rules, and your auto memory (Claude's own notes, the first 200 lines or 25KB of MEMORY.md) are all re-injected from disk after compaction. A rule scoped with paths: frontmatter, or a nested CLAUDE.md in a subdirectory, is a different story: it entered context when Claude read a matching file, so compaction summarizes it away, and it stays gone until Claude reads a matching file again.

The practical takeaway: if an instruction must survive a mid-task compaction, put it in your project-root CLAUDE.md, not in a path-scoped rule. Invoked skills come back too, but truncated to fit a per-skill cap, so keep the load-bearing parts near the top of each SKILL.md.

For the full mechanism-by-mechanism table, the exact skill token caps, the one startup element that does not reload, and how disable-model-invocation keeps side-effect skills out of context entirely, see what survives compaction in Claude Code.

CLAUDE.md as Persistent Memory

Your CLAUDE.md file persists across sessions automatically. Use it for context that should never be lost:

# CLAUDE.md
 
## Project Architecture
 
- Frontend: Next.js 15 with App Router
- Database: PostgreSQL with Prisma ORM
- Auth: NextAuth with Google provider
 
## Current Sprint Focus
 
- Building user dashboard
- Integrating payment processing
 
## Patterns and Conventions
 
- All API routes use tRPC
- Components follow compound pattern
- Error handling via custom ErrorBoundary

Claude reads this file at session start, giving you free context that survives restarts. The key: keep it lean. Every token in CLAUDE.md is a token you can't use for conversation. Document only what Claude needs on every session, not everything about your project.

When establishing new patterns or making architectural decisions mid-session, explicitly preserve them:

claude "Document this pattern in CLAUDE.md so we maintain consistency:
[describe the pattern/decision]"

This turns ephemeral session knowledge into persistent project knowledge. See our memory optimization guide for advanced strategies on keeping your persistent context efficient.

Strategic Task Chunking

Instead of pushing Claude to exhaustion, divide work into context-sized chunks with natural breakpoints.

Complete components before integration:

# First session: build the component completely
claude "Build the UserProfile component with all props and styling"
 
# New session: integrate with the larger system
claude "Integrate UserProfile with the dashboard layout"

Finish research phases before implementation:

# Research session
claude "Research authentication patterns for our Next.js app - document options"
 
# Implementation session (fresh context)
claude "Implement OAuth using the NextAuth pattern"

Between sessions, create handoff notes that capture everything the next session needs:

claude "Create comprehensive handoff notes including:
1. Current project state and architecture
2. Coding patterns and conventions we've established
3. Key decisions made and reasoning
4. Specific next steps with implementation details
5. Files that will need attention and why"

Store these notes in your project. When starting fresh, point Claude to them for continuity. The handoff technique is especially useful when you know you're about to hit context limits and want to preserve more detail than auto-compaction's lossy summary provides.

Delegate Heavy Reads to Subagents

The cheapest context is the context you never load. When a task needs Claude to read a pile of files it won't reference again, like researching how a subsystem works before you change it, hand that work to a subagent. A subagent runs in its own separate context window. It loads its own copy of your CLAUDE.md and the same skills and MCP servers, does the reading over there, and returns only its final summary to your main session.

Anthropic's own walkthrough makes the savings concrete: in their example, a subagent read 6,100 tokens of files and returned a 420-token result. The roughly 5,700 tokens of file content that would have crowded your main window never touched it. One caveat worth knowing: a subagent does not inherit your conversation history or your main session's auto memory, so give it enough context in the task prompt to work on its own.

This is the highest-leverage move for long sessions, and it's the pattern ClaudeFast's Code Kit is built around: a coordinating main thread that delegates exploration to specialist agents so its own context stays reserved for decisions and your conversation.

Context Recovery Techniques

When context loss happens mid-session, whether from compaction, a crash, or just too much drift, use this recovery sequence:

Quick recovery: Reference your most recent checkpoint notes or session file
Pattern review: Ask Claude to scan recent files and identify established patterns
Architecture refresh: Provide a brief project overview focusing on current components
Continuation strategy: Start with small, isolated tasks while context rebuilds
Fresh start: Use /clear when context becomes too corrupted to salvage. Sometimes starting clean with good handoff notes is faster than fighting degraded context

For long sessions (60+ minutes), proactive preservation beats reactive recovery. Run a quick context refresh every 30 minutes:

# Quick context refresh
claude "Update your understanding of our project: review recent changes,
confirm current patterns, and note any shifts in approach"

This prevents context drift where Claude gradually loses alignment with your project's specific requirements.

If you want to automate this entirely, ClaudeFast's Code Kit includes a ContextRecoveryHook that automatically backs up your session state before compaction occurs. It captures architectural decisions, established patterns, and current progress so nothing gets lost when the context window resets.

Monitoring Context Usage

Knowing when you're running low prevents surprises. Three approaches, from simple to advanced:

1. Watch the status bar. Claude Code shows token percentage at the bottom of the terminal. This is your primary signal. Build the habit of glancing at it before starting each new task.

2. Run /context for a breakdown. The /context command shows exactly where your tokens are going: system prompt, tools, memory files, skills, and conversation history. If your memory files are eating 15% of your window before you even start, that's a problem you can fix. Pair it with /memory, which lists the CLAUDE.md and rules files loaded this session and links to your auto memory folder, so you can catch a bloated memory file before it taxes you all session.

3. Use StatusLine for automated monitoring. StatusLine receives real-time context metrics and can trigger warnings at configurable thresholds. See the context buffer management guide for implementation details, including how to calculate your actual free space accounting for the compaction buffer.

One practical tip: if you're regularly hitting compaction before finishing your task, the answer usually isn't a bigger context window. It's smaller tasks. Break the work down further.

Constraints as Training

Working within token limits forces deliberate choices that make you a better Claude Code user:

Explicit file selection: Include only relevant files, not entire codebases
Clear task definition: Break objectives into concrete, actionable steps
Priority-based organization: Structure prompts with critical details first
Compact examples: Provide minimal but representative code samples

These skills transfer when context windows eventually expand. Developers who embrace constraints become better collaborators regardless of technical limits. If your plan supports it, the 1M context window gives you 5x the usable space with no pricing premium. It runs on Fable 5, Sonnet 5, Opus 4.6 and later, and Sonnet 4.6, and Sonnet 5 reaches 1M with no [1m] variant to select. Compaction works the same way at the larger limit, so the discipline from working within 200K still makes you more effective.

Next Actions

Immediate: Check your token percentage in the status bar. If above 80%, exit and restart for your next complex task.

This week: Practice chunking one large feature into component-then-integration phases.

Ongoing: Build your CLAUDE.md file with project context that persists across sessions. For the complete framework on engineering information flow to Claude, see context engineering.

Master context management and you handle projects 5x larger than developers who ignore these limits. If you want a pre-built system that applies these context min-maxing strategies automatically, with skills-first loading, sub-agent delegation for context conservation, and structured session recovery, ClaudeFast's Code Kit ships these patterns ready to use.

This is why top developers use "context resets" between planning and execution. Learn this and four other Claude Code best practices that compound over time.

Claude Code Context Window: Optimize Your Token Usage

On this page