Preferences

Worked with it a bit last night! Seems quick. I did run into the same problem I have with Gemini often where the response says something like, "I need to do x" or "I did x" and then nothing actually happens. Agent seems to think it actually does finish the task but it stops part way.

But I'm sure they will sort that out, as I dont have that issue with other anthropic models.


This is interesting. I’ve had this same issue trying to build an agentic system with the smaller ChatGPT models, almost no matter the prompt (“think aloud” are magic words that help a lot, but it’s still flaky). Most of the time it would either perform the tool call before explaining it (the default) or explain it but then not actually make the call.

I’ve been wondering how Cursor et al solved this problem (having the LLM explain what it will do before doing it is vitally important IMO), but maybe it’s just not a problem with the big models.

Your experience seems to support that smaller models are just generally worse about tool calling (were you using Gemini Flash?) when asked to reason first.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal