Comment by pkreg01 - Hacker Neue

pkreg01 Oct 20, 2025 parent

I share your observations. It's strange to see Anthropic loosing so much ground so fast - they seemed to be the first to crack long-horizon agentic tasks via what I can only assume is an extremely exotic RL process.

Now, I will concede that for non-coding long-horizon tasks, GPT-5 is marginally worse than Sonnet 4.5 in my own scaffolds. But GPT-5 is cheaper, and Sonnet 4.5 is about 2 months newer. However, for coding in a CLI context, GPT-5-Codex is night-and-day better. I don't know how they did it.

typpilol Oct 21, 2025

Every since 4.5, I can't get Claude to do anything that takes a while

4.0 would chug a long for 40 mins. 4.5 refuses and straight up says the scope is too big sometimes.

My theory is anthropic is super compute constrained and even though 4.5 is smarter, the usage limits and it's obsession with rushing to finish was put in mainly to save their servers compute.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous