Comment by throwaway314155

throwaway314155 Jun 19, 2025 parent

If I understand your point correctly - the main bottleneck for tool-calling/MCP is the models themselves being relatively terrible at tool-calling anything but the tools they were finetuned to work with until recently. Even with the latest developments, any given MCP server has a variable chance of success just due to the nature of LLM's only learning the most common downstream tasks. Further, LLM's _still_ struggle when you give them too many tools to call. They're poor at assessing the correct tool to use when given tools with overlapping functionality or similar function name/args.

This is what people mean when they say that MCP should maybe wait for a better LLM before going all-in on this design.

refulgentis Jun 19, 2025

Not in my opinion, works fine in general, wrote 2500 lines of tests for me over about 30 min tonight.

To your point that this isn't trivial or universal, there's a sharp gradient that you wouldn't notice if you're just opining on it as opposed to coding against it -- ex. I've spent every waking minute since mid-December on MCP-like territory, and it still bugs me out how worse every model is than Claude at it. It sounds like you have similar experience, though, perhaps not as satisfied with Claude as I am.

throwaway314155 OP Jun 19, 2025

A fair point I suppose. I'm not entirely inexperienced with it, but it does sound like you have more experience with it than I do.

> you wouldn't notice if you're just opining on it as opposed to coding against it

Maybe i'm being sensitive but that is perhaps not the way I would have worded that as it reads a bit like an insult. Food for thought.

This item has no comments currently.