Comment by bnchrch - Hacker Neue

bnchrch Nov 24, 2025 parent

Seeing these benchmarks makes me so happy.

Not because I love Anthropic (I do like them) but because it's staving off me having to change my Coding Agent.

This world is changing fast, and both keeping up with State of the Art and/or the feeling of FOMO is exhausting.

Ive been holding onto Claude Code for the last little while since Ive built up a robust set of habits, slash commands, and sub agents that help me squeeze as much out of the platform as possible.

But with the last few releases of Gemini and Codex I've been getting closer and closer to throwing it all out to start fresh in a new ecosystem.

Thankfully Anthropic has come out swinging today and my own SOP's can remain in tact a little while longer.

hakanderyal Nov 24, 2025

I think we are at the point where you can reliably ignore the hype and not get left behind. Until the next breakthrough at least.

I've been using Claude Code with Sonnet since August, and there haven't been any case where I thought about checking other models to see if they are any better. Things just worked. Yes, requires effort to steer correctly, but all of them do with their own quirks. Then 4.5 came, things got better automatically. Now with Opus, another step forward.

I've just ignored all the people pushing codex for the last weeks.

Don't fall into that trap and you'll be much more productive.

macNchz Nov 25, 2025

The most effective AI coding assistant winds up being a complex interplay between the editor tooling, the language and frameworks being used, and the person driving. I think it’s worth experimenting. Just this afternoon Gemini 3 via the Gemini CLI fixed a whole slate of bugs that Claude Code simply could not, basically in one shot.

hakanderyal Nov 25, 2025

If you have the time & bandwidth for it, sure. But I do not, at I'm already at max budget with 200$ Anthrophic subscription.

My point is, the cases where Claude gets stuck and I had to step in and figure things out has been few and far between that I doesn't really matter. If the programmers workflow is working fine with Claude (or codex, gemini etc.), one shouldn't feel like they are missing out by not using the other ones.

nojs Nov 24, 2025

Using both extensively I feel codex is slightly “smarter” for debugging complex problems but on net I still find CC more productive. The difference is very marginal though.

tordrt Nov 24, 2025

I tried codex due to the same reasoning you list. The grass is not greener on the other side.. I usually only opt for codex when my claude code rate limit hits.

bavell Nov 24, 2025

Same boat and same thoughts here! Hope it holds its own against the competition, I've become a bit of a fan of Anthropic and their focus on devs.

diego_sandoval Nov 24, 2025

I personally jumped ship from Claude to OpenAI due to the rate-limiting in Claude, and have no intention of coming back unless I get convinced that the new limits are at least double of what they were when I left.

Even if the code generated by Claude is slightly better, with GPT, I can send as many requests as I want and have no fear or running into any limit, so I feel free to experiment and screw up if necessary.

detroitcoder Nov 24, 2025

You can switch to consumption-based usage and bypass this all together but it can be expensive. I run an enterprise account and my biggest users spend ~2,000 a month on claude code (not sdk or api). I tried to switch them to subscription based at $250 and they got rate limited on the first/second day of usage like you described. I considered trying to have them default to subscription and then switch to consumption when they get rate limited, but I didn't want to burden them with that yet.

However for many of our users that are CC users they actually don't hit the $250 number most months so its actually cheaper to use consumption in many use cases surprisingly.

adriand Nov 24, 2025

Don't throw away what's working for you just because some other company (temporarily) leapfrogs Anthropic a few percent on a benchmark. There's a lot to be said for what you're good at.

I also really want Anthropic to succeed because they are without question the most ethical of the frontier AI labs.

wahnfrieden Nov 24, 2025

Aren’t they pursuing regulatory capture for monopoly like conditions? I can’t trust any edge in consumer friendliness when those are their longer term goal and tactics they employ today toward it. It reeks of permformativity

littlestymaar Nov 24, 2025

> I also really want Anthropic to succeed because they are without question the most ethical of the frontier AI labs.

I wouldn't call Dario spending all this time lobbying to ban open weight models “ethical”, personally but at least he's not doing Nazi signs on stage and doesn't have a shady crypto company trying to harvest the world's biometric data, so it may just be the bar that is low.

adriand Nov 24, 2025

I can’t speak to his true motives but there are ethical reasons to oppose open weights. Hinton is an example of a non-conflicted advocate for that. If you believe AI is a powerful dual use tech technology like nuclear, open weights are a major risk.

wahnfrieden Nov 24, 2025

You need much less of a robust set of habits, commands, sub agent type complexity with Codex. Not only because it lacks some of these features, it also doesn't need them as much.

sothatsit Nov 24, 2025

The benefit you get from juggling different tools is at best marginal. In terms of actually getting work done, both Sonnet and GPT-5.1-Codex are both pretty effective. It looks like Opus will be another meaningful, but incremental, change, which I am excited about but probably won’t dramatically change how much these tools impact our work.

enraged_camel Nov 25, 2025

It’s not marginal in my experience. Once you spend enough time with all them you realize each model excels at different areas.

Stevvo Nov 24, 2025

With Cursor or Copilot+VSCode, you get all the models, can switch any time. When a new model is announced its available same day.

CryptoBanker Nov 25, 2025

You don't get any reasoning with Copilot

edf13 Nov 24, 2025

I’m threw a few hours at Codex the other day and was incredibly disappointed with the outcome…

I’m a heavy Claude code user and similar workloads just didn’t work out well for me on Codex.

One of the areas I think is going to make a big difference to any model soon is speed. We can build error correcting systems into the tools - but the base models need more speed (and obviously with that lower costs)

chrisweekly Nov 24, 2025

Any experience w/ Haiku-4.5? Your "heavy Claude code user" and "speed" comment gave me hope you might have insights. TIA

pertymcpert Nov 24, 2025

Not GP but my experience with Haiku-4.5 has been poor. It certainly doesn't feel like Sonnet 4.0 level performance. It looked at some python test failures and went in a completely wrong direction in trying to address a surface level detail rather than understanding the real cause of the problem. Tested it with Sonnet 4.5 and it did it fine, as an experienced human would.

chrisweekly Nov 25, 2025

Thanks!

senordevnyc Nov 25, 2025

Try composer 1 (cursor’s new model). I plan with sonnet 4.5, and then execute with composer, because it’s just so fast.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous