Comment by AlwaysRock

AlwaysRock Oct 16, 2025 parent

Worked with it a bit last night! Seems quick. I did run into the same problem I have with Gemini often where the response says something like, "I need to do x" or "I did x" and then nothing actually happens. Agent seems to think it actually does finish the task but it stops part way.

But I'm sure they will sort that out, as I dont have that issue with other anthropic models.

solarkraft Oct 16, 2025

This is interesting. I’ve had this same issue trying to build an agentic system with the smaller ChatGPT models, almost no matter the prompt (“think aloud” are magic words that help a lot, but it’s still flaky). Most of the time it would either perform the tool call before explaining it (the default) or explain it but then not actually make the call.

I’ve been wondering how Cursor et al solved this problem (having the LLM explain what it will do before doing it is vitally important IMO), but maybe it’s just not a problem with the big models.

Your experience seems to support that smaller models are just generally worse about tool calling (were you using Gemini Flash?) when asked to reason first.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous