Claude Code is not safe to use right now

This month, Anthropic anthropic changed the Claude Code client in a way that radically changed Claude Opus's effectiveness in Claude Code claude. Their team, or more specifically Boris the creator of Claude Code, has described a few ways to tweak it's behavior and closed the issue. It appears that folks are only getting acceptable results if they downgrade Claude Code to the version before March 3rd. Ultimately it has become so unsafe and incapable that I've had to expressly tell my coworkers to stop using Claude Opus.

"Opus-level intelligence" is what convinced me to finally engage with Claude Code and conversational code updates upon Opus 4.5's release.

Many continue to express their frustration in the closed issue on Anthropic's public repository, reddit, Twitter / X. In fairness, here's what Boris specifically shared. It is so lacking in empathy that I am convinced that anthropic's teams do not dog-food the same Claude Code and the same model configuration defaults they give to the public.

There's a lot here, I will try to break it down a bit. These are the two core things happening:

redact-thinking-2026-02-12

This beta header hides thinking from the UI, since most people don't look at it. It does not impact thinking itself, nor does it impact thinking budgets or the way extended reasoning works under the hood. It is a UI-only change.

Under the hood, by setting this header we avoid needing thinking summaries, which reduces latency. You can opt out of it with showThinkingSummaries: true in your settings.json (see docs).

If you are analyzing locally stored transcripts, you wouldn't see raw thinking stored when this header is set, which is likely influencing the analysis. When Claude sees lack of thinking in transcripts for this analysis, it may not realize that the thinking is still there, and is simply not user-facing.

Thinking depth had already dropped ~67% by late February

We landed two changes in Feb that would have impacted this. We evaluated both carefully:

1/ Opus 4.6 launch → adaptive thinking default (Feb 9)

Opus 4.6 supports adaptive thinking, which is different from thinking budgets that we used to support. In this mode, the model decides how long to think for, which tends to work better than fixed thinking budgets across the board. CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING to opt out.

2/ Medium effort (85) default on Opus 4.6 (Mar 3)

We found that effort=85 was a sweet spot on the intelligence-latency/cost curve for most users, improving token efficiency while reducing latency. On of our product principles is to avoid changing settings on users' behalf, and ideally we would have set effort=85 from the start. We felt this was an important setting to change, so our approach was to:

  1. Roll it out with a dialog so users are aware of the change and have a chance to opt out
  2. Show the effort the first few times you opened Claude Code, so it wasn't surprising.

Some people want the model to think for longer, even if it takes more time and tokens. To improve intelligence more, set effort=high via /effort or in your settings.json. This setting is sticky across sessions, and can be shared among users. You can also use the ULTRATHINK keyword to use high effort for a single turn, or set /effort max to use even higher effort for the rest of the conversation.

Boris Cherny

It's not all about effort. It spends longer "thinking" (but not actually somehow?), it does more tool calls, and gets the problem wrong, tokens "burn" faster when they shouldn't. Occasionally I find developing with Claude Code to be so frustrating that I have to walk outside and touch grass to regain my cool. Now Claude Code is unusable even with configuration changes. I should not have to spend an hour baby sitting and carefully explaining what it has already tried and what it got wrong only for it to continue getting it wrong.

I'm using agents to get results faster with the understanding that while I know what it is doing, and that I could do the same, it would take me much longer to enact that change. That promise of a trustworthy (but still naive or ignorant or clueless) partner has been broken. As of March, Claude Opus broke that promise.

(original source in spanish)

It can no longer review itself, self-correct, or improve in real time.

They sacrificed the ability to think about its own thinking... in order to save on compute.

Real data (6,852 production sessions — AMD):

chart-decreasing Thinking depth: -73% (2,200 → 600 chars)
chart-decreasing Reads before editing: -70% (6.6 → 2.0)
chart-increasing Blind edits (without reading): +440% (6.2% → 33.7%)
chart-increasing API calls per task: Up to 80x higher

Even at EFFORT MAX (April 2026), it produces worse results than the HIGH setting from January 2026.

Sthiven R.

So far the public agrees on their own benchmarks. Opus is worse now than it was in January and February. Its effectiveness in actual use is correlated with their metrics dramatically changing.

Profile image
@edzitron.com
Anthropic appears to be torturing AI people. Claude Opus 4.6 has become significantly less-performant over time, they also have been messing with rate limits, more token burn in some cases too, documented at length. Anthropic basically saying “no problems on our end sorry”

My next renewal with Claude won't be on the max plan. As if on cue, OpenAI openai announced a $100/mo offering. In five minutes, issues that literally got me to rage quit Claude were resolved. Their terminal interface is a step down from Claude Code and Cursor's Agent CLI. But the GPT 5.4 model works — which I cannot say for Claude Opus.

This screenshot displays a pricing table for a software service featuring two subscription tiers side by side on a dark background. The left column offers the "Plus" plan for $20 per month with features like advanced reasoning models, while the right column presents the "Pro" plan starting at $100 per month, which includes 5x to 20x more usage and access to GPT-5.4 Pro.

I may still use "Opus-level intelligence" to describe the transition of capability and trust to hand brownfield work off to an agent. However, "Opus" is not safe to use outside toy projects at this time and this seriously impacts my future trust in Anthropic as an inference provider. How can I trust any process I automate with an LLM they serve if the underlying model degrades so loudly and in turn their team's response is to ignore the feedback?

If only GPT 5.x wasn't so bad at UI.