Extended Thinking: When Claude actually thinks before it talks

I was reviewing a client's R&D operating model — the kind with three organizational layers, cross-functional dependencies, and a shared services team that touches everything. I asked Claude to analyze where the coordination bottlenecks were and suggest a restructured flow.

Claude answered in about two seconds. The answer was fine. Competent. The kind of response you'd get from a smart intern who skimmed the brief.

Then I turned on Extended Thinking and asked the same question.

Claude paused. Not the usual instant response. An actual pause — maybe fifteen seconds. And then it delivered an analysis that identified a dependency I hadn't seen, proposed a phased rollout I hadn't considered, and flagged a risk with the shared services queue that turned out to be exactly the problem we ran into two weeks later.

Same model. Same prompt. Completely different quality of output. The only difference was that I told Claude to think before it talked.

That's Extended Thinking. And once you understand when to use it and when not to, it changes how you work with Claude. Most of what follows comes from Anthropic's learning platform and the technical documentation, filtered through daily use in real consulting engagements.

What Extended Thinking Actually Does

Normally, Claude generates its response token by token, in one pass. It's fast. It's also how you get those answers that are technically correct but miss the nuance — the response that addresses what you said but not what you meant.

Extended Thinking adds a step before the final response. Claude gets a "thinking budget" — a dedicated space where it reasons through the problem step by step before it commits to an answer. You can see this thinking process in the interface. It shows up as an expandable block above the actual response.

Think of it like this: normal Claude is the colleague who answers your question in the hallway. Extended Thinking Claude is the colleague who says "let me think about that for a minute," goes back to their desk, works through the problem, and then comes back with a proper answer.

The thinking isn't decorative. Claude uses it to break down complex problems, consider alternatives, catch its own errors, and build a more structured argument before writing the response you actually see.

When It Matters and When It Doesn't

Extended Thinking is not a "make everything better" button. For simple questions, it adds latency without improving the answer. For complex ones, the difference can be dramatic.

Task Type	Extended Thinking?	Why
Simple factual questions	No	No benefit. "What's the capital of France" doesn't need a reasoning chain.
Drafting emails, quick copy	No	Speed matters more than depth. Normal mode is fine.
Multi-step analysis	Yes	Dependencies, trade-offs, second-order effects. This is where thinking pays off.
Strategy and planning	Yes	Claude catches edge cases and considers alternatives it would skip in normal mode.
Code review / debugging	Yes	Step-by-step reasoning through logic finds bugs that pattern matching misses.
Math and formal reasoning	Yes	The original use case. Dramatically better accuracy on anything requiring precise logic.
Creative writing	Sometimes	Useful for structure and plot. Not necessary for freeform generation.
Summarization	No	Standard mode handles this well. Thinking adds time without meaningful improvement.

My rule of thumb: if I'd need more than thirty seconds to think through the answer myself, I turn on Extended Thinking for Claude too.

How It Works Under the Hood

You don't need to be a developer to understand the mechanics, and knowing them helps you use the feature better.

When Extended Thinking is enabled, Claude gets a token budget for its internal reasoning. In the API, you set this explicitly. In the chat interface, it's handled automatically — Claude decides how much to think based on the complexity of the question. Anthropic calls this "adaptive thinking" on the latest models.

The thinking happens before the response. Claude works through the problem in its thinking block, then writes a final answer informed by that reasoning. On newer models, you see a summary of the thinking process rather than the raw stream of consciousness. The summary preserves the key reasoning steps without the noise.

A few things worth knowing:

Thinking takes time. A complex analysis might take 15–30 seconds of thinking before Claude starts writing. This is normal. It's the point.

The thinking is real. This isn't a UI trick. The quality difference between a response with and without thinking is measurable, especially on tasks that require multi-step reasoning, mathematical logic, or strategic analysis.

You're paying for the thinking. On the API side, thinking tokens count as output tokens. In the chat interface with a Pro or Team plan, they count toward your usage. More thinking means higher cost but better output — a trade-off worth understanding.

Extended Thinking with Tools

This is where it gets interesting for power users. When Extended Thinking is combined with tool use — Claude searching the web, reading your Google Drive, querying your Notion — something called "interleaved thinking" kicks in.

Instead of thinking once and then acting, Claude thinks between each step. It searches your Drive, thinks about what it found, decides what to search next, thinks again, and builds its analysis iteratively. Each tool result feeds back into the reasoning chain.

Mode	How Claude Works	Result Quality
Normal + Tools	Uses tool → reads result → responds	Fast but shallow. Takes tool results at face value.
Thinking + Tools	Thinks → uses tool → thinks about result → uses another tool → thinks again → responds	Slower but much deeper. Synthesizes across multiple sources. Catches contradictions.

In practice, this means asking Claude to "find the relevant project documents in my Drive and analyze whether our current sprint structure supports the delivery timeline" produces a genuinely different quality of answer with Extended Thinking on. Claude doesn't just retrieve documents — it reasons about what they mean in context.

Practical Tips

Don't leave it on for everything. Extended Thinking adds latency. For quick tasks — drafting a Slack message, formatting a table, simple questions — turn it off. Save it for the work that actually benefits from deeper reasoning.

Give it something to think about. A vague prompt with Extended Thinking enabled just produces a longer thinking block before a mediocre answer. The feature amplifies the quality of your prompt. A well-structured, specific question with thinking enabled produces exceptional output. A lazy question with thinking enabled produces a slow, lazy answer.

Read the thinking. The thinking block isn't just overhead — it's a window into Claude's reasoning. When the final answer seems off, expand the thinking and you'll often find exactly where it went sideways. This makes debugging your prompts much faster.

Use it for high-stakes work. Client proposals. Strategic analyses. Anything where being 80% right isn't good enough. The extra 15–30 seconds of thinking regularly catches errors, edge cases, and missed considerations that would cost you hours to fix later.

Pair it with your project context. Extended Thinking works best when Claude has rich context to reason about. A well-configured project with uploaded reference documents and clear instructions, combined with Extended Thinking, is the closest thing I've found to having a genuine senior analyst on call.

The Difference It Makes

I'll be direct about this: Extended Thinking didn't change my workflow as dramatically as Memory or Artifacts did. Those features changed what Claude is to me. Extended Thinking changed how well it performs at the things it was already doing.

But "how well" matters. A lot.

The analysis that catches a dependency I missed. The strategy document that anticipates the objection my client will raise. The code review that finds the edge case before it hits production. These aren't nice-to-haves. These are the moments where good consulting becomes great consulting.

Extended Thinking is the difference between Claude giving you the obvious answer and Claude giving you the right answer. Sometimes those are the same thing. When they're not, you'll be glad you gave it a moment to think.

Normal Claude is your colleague answering in the hallway. Extended Thinking Claude is the one who goes back to their desk, works through it properly, and comes back with something you can actually use. Know which one you need for the job in front of you.

What Extended Thinking Actually Does

When It Matters and When It Doesn't

How It Works Under the Hood

Extended Thinking with Tools

Practical Tips

The Difference It Makes

Related

Warp Speed Workshop