Comment by refulgentis

refulgentis Jun 19, 2025 parent

I agree vehemently, I'm sort of stunned how...slow...things are in practice. I quit my job 2 years ago to do LLM client stuff and I still haven't made it to Google calendar. It's handy as a user to have something to plug holes in the interim.

In the limit, I remember some old saw about how every had the same top 3 rows of apps on their iPhone homescreen, but the last row was all different. I bet IT will be managing, and dev teams will make, their own bespoke MCP servers for years to come.

throwaway314155 Jun 19, 2025

If I understand your point correctly - the main bottleneck for tool-calling/MCP is the models themselves being relatively terrible at tool-calling anything but the tools they were finetuned to work with until recently. Even with the latest developments, any given MCP server has a variable chance of success just due to the nature of LLM's only learning the most common downstream tasks. Further, LLM's _still_ struggle when you give them too many tools to call. They're poor at assessing the correct tool to use when given tools with overlapping functionality or similar function name/args.

This is what people mean when they say that MCP should maybe wait for a better LLM before going all-in on this design.

refulgentis OP Jun 19, 2025

Not in my opinion, works fine in general, wrote 2500 lines of tests for me over about 30 min tonight.

To your point that this isn't trivial or universal, there's a sharp gradient that you wouldn't notice if you're just opining on it as opposed to coding against it -- ex. I've spent every waking minute since mid-December on MCP-like territory, and it still bugs me out how worse every model is than Claude at it. It sounds like you have similar experience, though, perhaps not as satisfied with Claude as I am.

throwaway314155 Jun 19, 2025

A fair point I suppose. I'm not entirely inexperienced with it, but it does sound like you have more experience with it than I do.

> you wouldn't notice if you're just opining on it as opposed to coding against it

Maybe i'm being sensitive but that is perhaps not the way I would have worded that as it reads a bit like an insult. Food for thought.

This item has no comments currently.