Claude Code Fast Mode for Opus 4.6. What Developers Need to Know

ℹ️ Quick Answer: Claude Code fast mode is an experimental feature that delivers up to 2.5x faster output from the Opus 4.6 model at premium pricing ($30/$150 per million tokens). Toggle it with /fast in your terminal. It’s the same model running with faster inference, not a different or dumbed down version. Best for rapid iteration, live debugging, and interactive coding sessions where waiting 30 seconds kills your momentum.

Boris Cherny (the guy who created Claude Code as a side project back in September 2024, which is wild to think about) dropped an announcement on Threads. Opus 4.6 was live but buried in that post was something that caught my attention more than the headline features.

“For Claude Code users, you can also now more precisely tune how much the model thinks. Run /model and arrow left/right to tune effort.”

Wait. I can tell the model to think less?

What Claude Code Fast Mode Actually Does

Our teams have been building with a 2.5x-faster version of Claude Opus 4.6.

We’re now making it available as an early experiment via Claude Code and our API.
— Claude (@claudeai) February 7, 2026

Fast mode isn’t a smaller model. That’s the first thing people get wrong about it. When you type /fast in your Claude Code terminal, you’re still running Opus 4.6 with the exact same intelligence and capabilities. The difference is in the API configuration. Anthropic prioritizes speed over cost efficiency on the backend, and you get your output tokens up to 2.5x faster.

Type /fast. A little lightning bolt icon shows up next to your prompt. That’s it. You’re in fast mode. Type /fast again to turn it off.

The catch? It costs more. Standard Opus 4.6 runs $5/$25 per million tokens (input/output). Fast mode jumps to $30/$150. That’s a 6x increase on output tokens. Anthropic is running a 50% discount through February 16 to let people try it, but after that, you’re paying full freight.

Fast Mode vs Effort Tuning. Two Different Levers

This confused me at first, and I think it’ll confuse a lot of people. There are actually two separate controls for speed now.

How Fast Mode Works

Fast mode keeps the model quality identical but reduces latency by running faster inference. You pay more, but you get the same brain working faster.

How Effort Tuning Works

Effort tuning adjusts how much the model thinks. You get four levels (low, medium, high, max) and the default is high. Dropping to medium means the model spends less time reasoning through its answer. Faster responses, lower cost, but potentially worse quality on hard problems.

Combining Both for Maximum Control

You can combine them. Fast mode plus low effort gives you maximum speed for straightforward tasks like generating boilerplate or fixing a typo in a config file. Fast mode plus max effort gives you the fastest possible deep reasoning for genuinely hard problems where you still don’t want to wait around.

That flexibility is actually the more interesting story than fast mode by itself.

When Fast Mode Is Worth the Premium

Pair Programming Vibes

Claude Code for non-coders featured image showing AI assistant tools

When you’re treating Claude Code more like a conversation partner than a batch processor, latency matters enormously. Fast mode makes it feel less like submitting a request and more like talking to someone.

Live Debugging Sessions

When you’re in a rapid back and forth trying to isolate a bug, those 20 to 40 second waits between responses compound fast. Fast mode keeps the conversation feeling interactive instead of like email.

Quick Iteration Cycles

Making a small change, testing it, asking Claude to adjust, testing again. The kind of loop you run 15 times in an hour. Cutting each wait from 30 seconds to 12 seconds saves real time here.

Where it’s not worth it? Long running autonomous tasks where Claude is going off on its own for 10 minutes anyway. Batch processing. Anything where you’re stepping away from the keyboard. In those cases, just use standard mode and save the money.

The Bigger Picture. Opus 4.6 Is a Different Animal

Fast mode is one piece of a much larger Opus 4.6 release. The benchmarks tell the real story. Opus 4.6 nearly doubled its predecessor’s score on ARC AGI 2, jumping from 37.6% to 68.8%. It scored the highest on Terminal Bench 2.0 at 65.4%, edging out GPT 5.2. On real world knowledge work tasks (measured by GDPVal AA), it beat GPT 5.2 by 144 Elo points.

And then there’s agent teams. This is the feature that made me sit up straight. You can now spin up multiple Claude instances working simultaneously on different parts of your project, coordinated by a lead agent. Anthropic demonstrated this by having 16 agents build a working C compiler from scratch over about 2,000 sessions. The result was a 100,000 line Rust based compiler that can build the Linux kernel on x86, ARM, and RISC V.

That’s a flex, sure. But the practical version is more like spinning up three to five agents to handle different features in parallel while you review and approve their work. Combined with the 1 million token context window (a first for Opus class models), this starts to feel like a real shift in how solo developers or small teams can operate.

Honest Limitations

Fast mode is in research preview. Pricing and availability might change. The 50% discount expires February 16, and the full pricing is genuinely expensive for heavy usage.

Agent teams is also a research preview and can get costly quickly. That C compiler demo cost Anthropic $20,000 in API calls.

Effort tuning takes some experimentation to get right. Defaulting to “low” on everything will give you faster, cheaper, and sometimes noticeably worse results on anything more complex than a simple refactor.

And if you’re using Claude Code through Amazon Bedrock, Google Vertex, or Azure, fast mode isn’t available yet. It’s direct Anthropic API and subscription plans only.

How to Get Started with Claude Code Fast Mode

Getting set up takes about two minutes.

Open Claude Code in your terminal (or VS Code extension)
Type /fast and hit Tab to enable fast mode
You’ll see “Fast mode ON” and a lightning bolt next to your prompt
Make sure extra usage billing is enabled in your Console settings
Start coding. Type /fast again whenever you want to toggle back to standard

For effort tuning, run /model and use arrow keys left and right to adjust between low, medium, high, and max.

If you’re on a Team or Enterprise plan, an admin needs to enable fast mode first in the Claude Code settings before individual users can access it.

Frequently Asked Questions About Claude Code Fast Mode

Frequently asked questions about making money with AI

Is Claude Code fast mode a different AI model?

No. Fast mode runs the same Opus 4.6 model with identical intelligence and capabilities. The difference is an API configuration that prioritizes output speed over cost efficiency, delivering up to 2.5x faster token generation.

How much does Claude Code fast mode cost?

Fast mode pricing is $30 per million input tokens and $150 per million output tokens for prompts under 200K tokens. That’s roughly 6x the standard Opus 4.6 output cost. Anthropic is offering a 50% discount through February 16, 2026.

Can I use fast mode and effort tuning together?

Yes. These are two independent controls. Fast mode affects inference speed (same quality, faster output, higher cost). Effort tuning affects how deeply the model reasons (lower effort means faster and cheaper but potentially less thorough). Combining fast mode with low effort gives you maximum speed for simple tasks.

Is fast mode available on AWS Bedrock or Google Vertex?

Not currently. Fast mode is only available through the Anthropic Console API and for Claude subscription plans (Pro, Max, Team, Enterprise) using extra usage billing.

Claude Code Fast Mode for Opus 4.6. What Developers Actually Need to Know

What Claude Code Fast Mode Actually Does