Dumb question: in that case, wouldn't it not be an MCP server? It would be an LLM client with the ability to execute tool calls made by the LLM?
I don't get how MCP could create a wrapper for all possible LLM inference APIs or why it'd be desirable (that's an awful long leash for me to give out on my API key)
An MCP Server can be many things. It can be as simple as an echo server or as complex as a full-blown tool-calling agent and beyond. The MCP Client Sampling feature is an interesting thing that's designed to allow the primary agent, the MCP Host, to offer up some subset of LLM models that is has access to for the MCP Servers it connects with. That would allow the MCP Server to make LLM calls that are mediated (or not, YMMV) by the MCP Host. As I said above, the feature is very limited right now, but still interesting for some simpler use cases. Why would you do this? So you don't have to configure every MCP Server you use with LLM credentials? And the particulars of exactly what model gets used are under your control. That allows the MCP Server to worry about the business logic and not about how to talk to a specific LLM Provider.
My main disappointment with sampling right now is the very limited scope. It'd be nice to support some universal tool calling syntax or something. Otherwise a reasonably complicated MCP Server is still going to need a direct LLM connect.