Article · AI Engineering

When to use Claude Haiku vs Sonnet (and when it doesn’t matter)

A practical rule-of-thumb from running them both in production.

Published · April 2026 5 min read by Nachiket Pai

TL;DR

Use Haiku for high-volume simple work — classification, extraction, routing, basic summaries. Use Sonnet for anything that needs reasoning or careful instruction-following. The rule of thumb most teams miss: model choice matters far less than prompt caching. Once you cache properly, the cost gap closes by ~70–80%, and you can afford Sonnet almost everywhere.

I’ve watched a lot of people argue about Haiku vs Sonnet. Almost all of them were optimizing the wrong thing.

The argument usually goes: "Sonnet is smarter, but Haiku is cheaper. Where’s the line?"

It’s the wrong question. The right question is: what does this prompt actually need to be good at?

Let me give you a working answer.

What Haiku is genuinely great at

Haiku is small, fast, cheap. It runs in milliseconds. It costs roughly a fifth of what Sonnet costs per token.

It’s the right pick when:

The task is high volume. Hundreds or thousands of calls per day on the same prompt.
The task is mostly pattern-matching, not reasoning. Things like:
- Classifying support tickets into 5 categories
- Extracting structured data from a known format
- Tagging emails by sentiment or urgency
- Routing a request to the right downstream agent
- Generating a one-line summary of a known document type
You’re OK with occasional surface-level mistakes that downstream logic can catch.

If a task is mechanical and you’d be fine with a 95%-accurate answer at one-fifth the cost, Haiku.

What Sonnet is genuinely great at

Sonnet is the workhorse. Bigger, slower, smarter. Better at chained reasoning. Better at edge cases. Better at following instructions with caveats and exceptions.

It’s the right pick when:

The task involves multi-step reasoning — "read this, weigh these factors, recommend one"
The output goes directly to a human (a customer email, a report, a recommendation)
The prompt has a lot of context to weigh — long documents, multi-turn tool use, complex schemas
A wrong answer costs you something — a misrouted refund, a confused customer, a bad CRM update

For 80% of production tasks where quality actually matters, Sonnet is the safe default.

The thing nobody tells you about cost

The reason people try to choose Haiku is cost. Sonnet feels expensive.

Here’s the trick: prompt caching collapses the cost gap.

The Anthropic API lets you cache any prompt prefix longer than 1,024 tokens. Once cached, every subsequent call that re-uses that prefix pays roughly 10% of the normal input cost for the cached portion. The cache lives for 5 minutes and resets every time you hit it.

If your workflow re-uses the same system prompt and tool definitions on every call — and most workflows do — you can cut input costs by ~70–80% with one line of config. Not an exaggeration. The first call pays full price; every call after that, for the next 5 minutes, pays the cache rate.

After caching, the Sonnet-vs-Haiku cost difference for most production workflows is small. You can run Sonnet wherever your accuracy demands it without blowing up the bill.

Common mistakes

Picking Haiku to save money before testing Sonnet. You don’t know what quality you need until you’ve seen the better model do the job. Run Sonnet first. Only downgrade if it’s cleanly safe.
Picking Sonnet for trivial tasks because "Haiku might miss something." If the task is "extract this email’s subject line," Haiku will not miss it.
Skipping prompt caching. Single biggest cost lever in the Anthropic API. Most teams don’t turn it on. They should.
Not measuring. "We use Sonnet" is not a system. Build a per-call log of model + tokens + outcome. After 1,000 calls, you’ll know exactly which prompts can safely downgrade.
Optimizing for cost before optimizing for prompt quality. A Sonnet call with a bad prompt is more wasteful than a Haiku call with a great prompt. Fix the prompt first.

The honest rule of thumb

Default to Sonnet. Drop to Haiku when:

Volume is genuinely high (>1,000 calls/day on this prompt), AND
The task is mechanical, AND
You’ve already turned on caching and the cost still looks meaningful.

If any one of those is missing, stay on Sonnet.

The fight over Haiku vs Sonnet is mostly a distraction. The wins live elsewhere — in prompt design, in caching, in measuring what each call actually does. Start there.

Curious what production workflows running these models look like?

See the sales replies case study →