Skip to main content

Cost Optimization Strategies

Module: Cost & Safety | Lesson: 2 of 3 | Time: ~10 minutes

What You Will Learn

  • How to use /compact to reduce token usage
  • How to provide targeted context instead of entire files
  • How to write efficient prompts that save tokens

Prerequisites

The Most Important Command: /compact

If you remember only one thing from this lesson, let it be this: use /compact regularly.

As you learned in the previous lesson, long conversations get expensive because Claude re-reads the entire conversation history with every new message. The /compact command solves this by summarizing the conversation history into a much shorter version, dramatically reducing the number of input tokens.

How to Use It

During any interactive session, type:

/compact

Claude will compress the entire conversation into a concise summary. After compacting, your next message will send far fewer tokens as context, which means lower cost.

When to Use It

  • After completing a task and moving on to a new topic
  • When you notice /cost climbing faster than expected
  • Every 10-15 messages in a long session
  • Before asking Claude to do something complex (so it has a clean, focused context)
tip

Think of /compact like clearing your desk before starting a new task. It removes the clutter so you can focus on what matters.

How Much Does It Save?

The savings depend on how much conversation history has accumulated, but it is common to see a 50-80% reduction in context size after compacting. That translates directly to lower costs for every subsequent message.


Keep CLAUDE.md Concise

Your CLAUDE.md file is loaded every time Claude starts in a project. If it is bloated with unnecessary details, you are paying for those extra tokens in every single message.

Best practice: keep CLAUDE.md under 200 lines.

warning

A 500-line CLAUDE.md adds thousands of tokens to every interaction. Over a long session with 30 messages, that is 30 times those extra tokens. The cost adds up quickly.

Here is how to keep it lean:

  • Include only essential project rules and preferences
  • Use bullet points, not paragraphs
  • Remove outdated instructions
  • Do not duplicate information that Claude can find by reading your code

Too long:

# CLAUDE.md
This project uses React. React is a JavaScript library for building user interfaces.
It was created by Facebook. We use React version 18.2. React uses a virtual DOM...
(200 more words of React explanation)

Just right:

# CLAUDE.md
- React 18.2 project with TypeScript
- Use functional components and hooks (no class components)
- Tests: Jest + React Testing Library
- Style: Tailwind CSS, no inline styles

Be Specific in Your Prompts

Vague prompts cost more because Claude needs to generate longer responses to cover all possibilities. Specific prompts lead to shorter, targeted responses.

Expensive (vague):

Tell me about this project

Claude might read multiple files and write a long overview -- lots of tokens.

Cheaper (specific):

What testing framework does this project use?

Claude reads one or two files and gives a short answer.

Expensive (open-ended):

Review this code

Claude examines everything and writes a comprehensive review.

Cheaper (targeted):

Check the login function in auth.js for security vulnerabilities

Claude focuses on one function in one file.

tip

The more specific your prompt, the less work Claude has to do, and the less it costs. Being specific is not just cheaper -- it usually gives you better answers too.


Avoid Pasting Huge Files

When you paste a large block of text directly into your prompt, every character becomes input tokens. If you need Claude to look at a file, let it read the file itself instead of pasting the contents:

Expensive:

Here is my entire 500-line config file: [paste 500 lines]
What is wrong with line 42?

Cheaper:

Read config.json and tell me if there is a problem around line 42

When Claude reads a file using its Read tool, it is the same token cost as pasting, but the advantage is that Claude can read just the portion it needs if you point it to a specific area.


Use Subagents for Context Isolation

In Module 10, you learned about subagents. They are also a cost optimization tool. When Claude spawns a subagent to handle a sub-task, the subagent gets a fresh, minimal context -- it does not inherit the full conversation history.

This means the subagent processes fewer tokens than if you handled the sub-task in the main conversation.

Example: Instead of:

Read all 15 files in the src/ folder and summarize each one

You could ask Claude to use subagents:

For each file in src/, use a subagent to read and summarize it. Collect the results.

Each subagent starts with a clean context, processes one file, and returns the result. This is often cheaper than accumulating all 15 files in a single conversation.


Start Fresh for New Topics

If you finish one task and want to start a completely different one, consider exiting and starting a new Claude session:

/exit

Then start fresh:

claude

A new session starts with zero conversation history, so your first few messages will be much cheaper than continuing a long session.

info

Alternatively, use /compact to reset the context without restarting. But if the new task is completely unrelated to the old one, a fresh session is cleaner.


Model Selection

Different Claude models have different costs and capabilities. If your task is simple, you might not need the most powerful (and most expensive) model.

You can specify a model when starting Claude:

claude --model claude-sonnet-4-20250514

Or in headless mode:

claude -p "Simple question here" --model claude-sonnet-4-20250514

General guidance:

  • Use the default model for complex tasks (code generation, debugging, architecture)
  • Consider a smaller/cheaper model for simple tasks (summaries, formatting, quick questions)
  • For batch processing many files, a cheaper model can save significant money at scale

Optimization Cheat Sheet

Here is a quick reference of all optimization strategies ranked by impact:

StrategyImpactEffort
Use /compact regularlyVery HighLow -- just type the command
Keep CLAUDE.md under 200 linesHighLow -- edit once
Be specific in promptsHighLow -- just think before typing
Start fresh sessions for new topicsMediumLow -- exit and restart
Avoid pasting huge filesMediumLow -- let Claude read files
Use subagents for parallel tasksMediumMedium -- requires planning
Choose the right modelMediumLow -- add a flag

Try It Yourself

  1. Compact experiment. Start a Claude session. Have a 10-message conversation about anything. Check /cost. Then type /compact. Send one more message and check /cost again. Compare the per-message cost before and after compacting.

  2. Prompt comparison. Ask the same question two ways:

    • Vague: Tell me about the files in this folder
    • Specific: How many JavaScript files are in this folder? Check /cost after each. Notice the difference.
  3. CLAUDE.md audit. If you have a CLAUDE.md file, open it and count the lines. If it is over 200 lines, trim it down. Remove anything that Claude does not need to know for every single conversation.

  4. Fresh start. After a long session, exit with /exit, start a new session, and check how much cheaper the first message is compared to the last message of the previous session.


What You Learned

  • /compact is the single most impactful cost optimization -- use it every 10-15 messages
  • Keep CLAUDE.md under 200 lines to avoid paying for unnecessary context in every message
  • Specific prompts cost less and produce better results than vague ones
  • Let Claude read files itself instead of pasting large blocks of text
  • Subagents provide context isolation that can reduce token usage for parallel tasks
  • Starting fresh sessions for new topics avoids carrying expensive old context
  • Model selection lets you trade off capability vs cost for simple tasks

Help Us Improve

How was this lesson? Take 2 minutes to share your feedback — it helps us make the tutorials better for everyone.

Give Feedback →

Next Up

Next: Safety Best Practices