Preferences

cstrahan
Joined 1,095 karma
Charles Strahan cstrahan.com

Software Developer in Washington, DC

[ my public key: https://keybase.io/cstrahan; my proof: https://keybase.io/cstrahan/sigs/yV0wzDolQldJ9GSMuflLE9_PIggrsCf9jzIQPUHqzKE ]


  1. Do you take issue with companies stating that they (the company) built something, instead of stating that their employees built something? Should the architects and senior developers disclaim any credit, because the majority of tickets were completed by junior and mid-level developers?

    Do you take issue with a CNC machinist stating that they made something, rather than stating that they did the CAD and CAM work but that it was the CNC machine that made the part?

    Non-zero delegation doesn’t mean that the person(s) doing the delegating have put zero effort into making something, so I don’t think that delegation makes it dishonest to say that you made something. But perhaps you disagree. Or, maybe you think the use of AI means that the person using AI isn’t putting any constructive effort into what was made — but then I’d say that you’re likely way overestimating the ability of LLMs.

  2. Let me rephrase GP into (I hope) a more useful analogy. — actually, here’s the whole analogous exchange:

    “A rectangle is an equal-sided rectangle (i.e. “square”) though. That’s what the R stands for.”

    “No? Why would you think a rectangle is a square?”

    Just as not all rectangles are squares (squares are a specific subset of rectangles), not all datagram protocols are UDP (UDP is just one particular datagram protocol).

  3. You read it that way because that’s the sensible way to read it. Everyone suggesting you missed the plot is in turn making a rather large logical leap.
  4. > These issues would still exist.

    I just explained why these issues don't apply to the LLM writing and invoking code: again, this is because code can apply successive transformations to the input without having to feed the intermediate results into the LLM's context. That code can read a file that would weigh in at 50,000 tokens, chain 100 functions together, producing a line that would be 20 tokens, and only the LLM will only see the final 20 token result. That really is only 20 tokens for the entire result -- the LLM never sees the 50,000 tokens from the file that was read via the program, nor does the LLM see the 10s of thousands of tokens worth of intermediate results between the successive transformations from each of the 100 functions.

    With MCP, there's no way for the LLM to invoke one tool call that expresses "compose/pipeline these 100 tools, please, and just give me the final result" -- the LLM must make 100 individual tool calls, manually threading the results through each tool call, which represents 100 opportunities for the LLM to make a mistake.

    It sounds like you are disagreeing with what I am saying, but it doesn't look you're giving any reason why you disagree, so I'm a bit confused.

  5. > Okay....now what function calls or libraries should the API use to write that code?

    Whatever the most popular library is which provides a client to whatever "thing" (locally running process, or service, or whatever) you are trying to interact with via LLM.

    I have had LLMs generate probably 10s of thousands of lines of code, all without providing an MCP. LLMs can do that. They are trained on code.

    What if there isn't a library available? Well, sure, then you could go implement an MCP server... or you could just write a library. It's practically the same effort.

    Not that I'm 100% against MCP servers. But I think there's some misconception spreading across the software community that an MCP server is always inherently the optimal, obvious solution; all I'm suggesting is that people actually think about the problem, instead of outsourcing that responsibility via a "no one ever got fired for buying IBM" type of appeal to bandwagon fallacy.

  6. What whoknowsidont is trying to say (IIUC): the models aren't trained on particular MCP use. Yes, the models "know" what MCP is. But the point is that they don't necessarily have MCP details baked in -- if they did, there would be no point in having MCP support serving prompts / tool descriptions.

    Well, arguably descriptions could be beneficial for interfaces that let you interactively test MCP tools, but that's certainly not the main reason. The main reason is that the models need to be informed about what the MCP server provides, and how to use it (where "how to use it" in this context means "what is the schema and intent behind the specific inputs/outputs" -- tool calls are baked into the training, and the OpenAI docs give a good example: https://platform.openai.com/docs/guides/function-calling).

  7. > but those that I have used (GitHub, [...])

    > Most MCP severs don't sit in between the LLM and the API endpoints [...]

    Your first example certainly isn't an example of that: https://github.com/github/github-mcp-server

    I suppose someone could try to abuse MCP by stuffing information about REST API endpoints into a the prompt/descriptions in a small MCP "skeleton" service, but I don't know of any. Can you provide examples?

    > they just teach them how to use the tools and then the LLM calls the APIs directly as any HTTP client would.

    I suspect you might have some deep misunderstandings about MCP.

  8. I just skimmed the README.

    I believe the point is to do something akin to "promise pipelining":

    https://capnproto.org/rpc.html

    http://erights.org/elib/distrib/pipeline.html

    When an MCP tool is used, all of the output is piped straight into the LLM's context. If another MCP tool is needed to aggregate/filter/transform/etc the previous output, the LLM has to try ("try" is a keyword -- LLMs are by their nature nondeterministic) and reproduce the needed bits as inputs into the next tool use. This increases latency dramatically and is an inefficient use of tokens.

    This "a1" project, if I'm reading it correctly, allows for pipelining multiple consecutive tool uses without the LLM/agent being in the loop, until the very end when the final results are handed off to the LLM.

    An alternative approach inspired by the same problems identified in MCP: https://blog.cloudflare.com/code-mode/

  9. I don't think there is any attempt at lock in here, it's simply that skills are superior to MCP.

    See this previous discussion on "Show HN: Playwright Skill for Claude Code – Less context than playwright-MCP (github.com/lackeyjb)": https://www.hackerneue.com/item?id=45642911

    MCP deficiencies are well known:

    https://www.anthropic.com/engineering/code-execution-with-mc...

    https://blog.cloudflare.com/code-mode/

  10. The LLM output differentiates between text output intended for the user to see, vs tool usage.

    You might be thinking "but I've never seen any sort of metadata in textual output from LLMs, so how does the client/agent know?"

    To which I will ask: when you loaded this page in your browser, did you see any HTML tags, CSS, etc? No. But that's only because your browser read the HTML rendered the page, hiding the markup from you.

    Similarly, what the LLM generates looks quite different compared to what you'll see in typical, interactive usage.

    See for example: https://platform.openai.com/docs/guides/function-calling

    The LLM might generate something like this for text:

        {
          "content": [
            {
              "type": "text",
              "text": "Hello there!"
            }
          ],
          "role": "assistant",
          "stop_reason": "end_turn"
        }
    
    Or this for a tool call:

        {
          "content": [
            {
              "type": "tool_use",
              "id": "toolu_abc123",
              "name": "get_current_weather",
              "input": {
                "location": "Boston, MA"
              }
            }
          ],
          "role": "assistant",
          "stop_reason": "tool_use"
        }
    
    The schema is enforced much like end-user visible structured outputs work -- if you're not familiar, many services will let you constrain the output to validate against a given schema. See for example:

    https://simonwillison.net/2025/Feb/28/llm-schemas/

    https://platform.openai.com/docs/guides/structured-outputs

  11. You're not wrong, but I figured I'd point out the cons / alternatives:

    > They can encapsulate (API) credentials, keeping those out of reach of the model

    An alternative to MCP, which would still provide this: code (as suggested in https://www.anthropic.com/engineering/code-execution-with-mc... and https://blog.cloudflare.com/code-mode/).

    Put the creds in a file, or secret manager of some sort, and let the LLM write code to read and use the creds. The downside is that you'd need to review the code to make sure that it isn't printing (or otherwise moving) the credentials, but then again you should probably be reviewing what the LLM is doing anyway.

    * Contrary to APIs, they can change their interface whenever they want and with little consequences.

    The upside is as stated, but the downside is that you're always polluting the context window with MCP tool descriptions.

  12. You are correct.

    I think many here have no idea what exactly MCP is, and think it's some sort of magic sauce that transcends how LLMs usually work.

        “But Brawndo has what plants crave! It's got electrolytes! '...Okay - what are electrolytes? Do you know? Yeah. It's what they use to make Brawndo.' But why do they use them in Brawndo? What do they do?''They're part of what plants crave.'But why do plants crave them?'Because plants crave Brawndo, and Brawndo has electrolytes.”
        ― Idiocracy Movie
  13. This is very true. But why stop there?

    Imagine a future where we have an evolved version of MCP -- call it MCP++.

    In MCP++, instead of having to implement a finite list of specialized variants like CreateUserAndAddToGroup, imagine MCP++ has a way to to feed the desired logic (create user, then add that user to $GROUP) directly to the endpoint. So there would be something like a POST /exec endpoint. And then the /exec endpoint can run the code (maybe it's WASM for something)...

    Wait a minute! We already have this. It's called programming.

    You could have the LLM write code, so that any pipelining (like your example), aggregation, filtering, or other transformation happens in that code, and the LLM only needs to spend the output tokens to write the code, and the only input tokens consumed is the final result.

    I definitely am not the first person to suggest this:

    https://www.anthropic.com/engineering/code-execution-with-mc...

    https://blog.cloudflare.com/code-mode/

    ... but I can say that, as soon as I read about MCP, my first thought was "why?"

    MCP is wasteful.

    If you want LLMs to interact with your software/service, write a library, let the scrapers scrape that code so that future LLM revisions have the library "baked into it" (so you no longer need to spam the context with MCP tool descriptions), and let the LLM write code, which it already "knows" how to do.

    What if your library is too new, or has a revision, though?

    That's already a solved problem -- you do what you'd do in any other case where you want the LLM to write code for you: point it at the docs / codebase.

  14. You're misinterpreting OP.

    OP is saying that the models have not been trained on particular MCP use, which is why MCP servers serve up tool descriptions, which are fed to the LLM just like any other text -- that is, these descriptions consume tokens and take up precious context.

    Here's a representative example, taken from a real world need I had a week ago. I want to port a code base from one language to another (ReasonML to TypeScript, for various reasons). I figure the best way to go about this would be to topologically sort the files by their dependencies, so I can start with porting files with absolutely zero imports, then port files where the only dependencies are on files I've already ported, and so on. Let's suppose I want to use Claude Code to help with this, just to make the choice of agent concrete.

    How should I go about this?

    The overhead of the MCP approach would be analogous to trying to cram all of the relevant files into the context, and asking Claude to sort them. Even if the context window is sufficient, that doesn't matter because I don't want Claude to "try its best" to give me the topological sort straight from its nondeterministic LLM "head".

    So what did I do?

    I gave it enough information about how to consult build metadata files to derive the dependency graph, and then had it write a Python script. The LLM is already trained on a massive corpus of Python code, so there's no need to spoon feed it "here's such and such standard library function", or "here's the basic Python syntax", etc -- it already "knows" that. No MCP tool descriptions required.

    And then Claude code spits out a script that, yes, I could have written myself, but it does it in maybe 1 minute total of my time. I can skim the script and make sure that it does exactly what it should be doing. Given that this is code, and not nondeterministic wishy washy LLM "reasoning", I know that the result is both deterministic and correct. The total token usage is tiny.

    If you look at what Anthropic and CloudFlare have to say on the matter (see https://www.anthropic.com/engineering/code-execution-with-mc... and https://blog.cloudflare.com/code-mode/), it's basically what I've described, but without explicitly telling the LLM to write a script / reviewing that script.

    If you have the LLM write code to interface with the world, it can leverage its training in that code, and the code itself will do what code does (precisely what it was configured to do), and the only tokens consumed will be the final result.

    MCP is incredibly wasteful and provides more opportunities for LLMs to make mistakes and/or get confused.

  15. > Is that just bad implementation? Where are the wasted tokens?

    How wouldn't it be wasteful?

    I'll try to summarize a couple sources:

    https://www.anthropic.com/engineering/code-execution-with-mc...

    https://blog.cloudflare.com/code-mode/

    Here's what Anthropic has to say about it: As MCP usage scales, there are two common patterns that can increase agent cost and latency:

        Tool definitions overload the context window;
        Intermediate tool results consume additional tokens.
        
        [...]
        
        Tool descriptions occupy more context window space, increasing response time and costs. In cases where agents are connected to thousands of tools, they’ll need to process hundreds of thousands of tokens before reading a request.
        
        [...]
        
        Most MCP clients allow models to directly call MCP tools. For example, you might ask your agent: "Download my meeting transcript from Google Drive and attach it to the Salesforce lead."
        
        The model will make calls like:
        
          TOOL CALL: gdrive.getDocument(documentId: "abc123")
                  → returns "Discussed Q4 goals...\n[full transcript text]"
                     (loaded into model context)
          
          TOOL CALL: salesforce.updateRecord(
             objectType: "SalesMeeting",
             recordId: "00Q5f000001abcXYZ",
               data: { "Notes": "Discussed Q4 goals...\n[full transcript text written out]" }
            )
            (model needs to write entire transcript into context again)
        
        Every intermediate result must pass through the model. In this example, the full call transcript flows through twice. For a 2-hour sales meeting, that could mean processing an additional 50,000 tokens. Even larger documents may exceed context window limits, breaking the workflow.
        
        With large documents or complex data structures, models may be more likely to make mistakes when copying data between tool calls.
    
    
    Now, if you were to instead have the LLM write code, that code can perform whatever filtering/aggregation/transformation etc that it needs, without having to round-trip from LLM to tool(s), back and forth, and the only tokens that are consumed are those of the final result. What happens with MCP? All of the text of each MCP call is flooded into the context, only for the LLM to have to make sense of what it just read to then either regurgitate that out into a file to post process (very likely with differences/"hallucinations" slipped in), or in the usual case (I'm personifying the LLM here for rhetorical purposes) it simply tries to reason about what it read to give you the filtered/aggregated/transformed/etc result you're looking for -- again, very likely with mistakes made.
  16. (Not OP)

    This is pretty well established. See for example:

    https://www.anthropic.com/engineering/code-execution-with-mc...

    https://blog.cloudflare.com/code-mode/

    Code (including shell scripting) allows the LLM to manipulate the results programmatically, which allows for filtering, aggregation and other logic to occur without multiple round trips between the agent and tool(s). This results in substantially less token usage, which means less compute waste, less cost, and less confusion/"hallucination" on the LLM's part.

    If one comes to the same conclusion that many others have (including CloudFlare) that code should be the means by which LLMs interface with the world, then why not skip writing an MCP server and instead just write a command-line program and/or library (as well as any public API necessary)?

  17. While the comparison doc mentions it, I don't see anyone here talking about Wipr 2. As the comparison doc states, if all you need is configuration-free ad blocking on macOS and iOS, Wipr 2 definitely gets my vote.
  18. It sure sounds like the problem is subsidizing the free tier by making the paid tiers excessively expensive. That in turn drives people to squeeze as much out of the free tier as they can, complain online, and jump ship to either a different service or self hosting.

    So... maybe don't do that?

  19. There may be some herbal supplements that impact GLP-1 release to some extent, but what is being talked about here are synthetic GLP-1 receptor agonists.
  20. I think OP is talking about the compression and bit rate, not the placement of the mic.

    When the mic is turned on, many headsets go from sounding good enough to sounding absolutely horrible. Something about switching from A2DP to HFP, and sharing the bandwidth between the incoming audio and outgoing audio.

    AirPods are impacted much, much less, largely I think because the AAC-ELD codec is decent, and Apple OSes switch the audio from stereo to mono when the mic is on (which seems like a no-brainer IMO, but I guess not all operating systems do this).

  21. > If one needs an integer or a float, the parser converts it.

    Which parser? That’s the problem: if you’re using JSON as a data interchange format, you’ll need to carefully control both the serializers and deserializers, and whatever libraries you use, they will need to (at least internally) hold onto the number in a lossless way — I am not aware of any libraries that do this. They all parse the number as an f64 before any deserializers run. If your input JSON contains a u128, then you’ll have a loss of precision when your type is deserialized.

    If you can set up (de)serialization to work the way you need it, then there’s no problem. But if you share your JSON serialized data with other parties, then you/they may be in for a bit of a surprise.

    You might find it a worth while exercise to try parsing JSON containing an arbitrary unsigned 128 bit integer in your language of choice.

  22. JSON numbers are not JavaScript Numbers.

    While the grammar is specified (that’s what JSON is, after all), the runtime representation is unspecified. A conformant JSON parser can parse “1” as 1.0. They can be backed by doubles, or singles, or arbitrary precision.

  23. I think a more charitable interpretation of TFA would be: "I Have Come Up With A Recipe for Solving PyTorch's Cross-Platform Nightmare"

    That is: there's nothing stopping the author from building on the approach he shares to also include Windows/FreeBSD/NetBSD/whatever.

    It's his project (FileChat), and I would guess he uses Linux. It's natural that he'd solve this problem for the platforms he uses, and for which wheels are readily available.

  24. > The alternative is to import packages directly from the (git) repository.

    That sounds great in theory. In practice, NPM is very, very buggy, and some of those bugs impact pulling deps from git repos. See my issue here: https://github.com/npm/cli/issues/8440

    Here's the history behind that:

    Projects with build steps were silently broken as late as 2020: https://github.com/npm/cli/issues/1865

    Somehow no one thought to test this until 2020, and the entire NPM user base either didn't use the feature, or couldn't be arsed to raise the issue until 2020.

    The problem gets kinda sorta fixed in late 2020: https://github.com/npm/pacote/issues/53

    I say kinda sorta fixed, because somehow they only fixed (part of) the problem when installing package from git non-globally -- `npm install -g whatever` is still completely broken. Again, somehow no one thought to test this, I guess. The issue I opened, which I mentioned at the very beginning of this comment, addresses this bug.

    Now, I say "part of of the problem" was fixed because the npm docs blatantly lie to you about how prepack scripts work, which requires a workaround (which, again, only helps when not installing globally -- that's still completely broken); from https://docs.npmjs.com/cli/v8/using-npm/scripts:

        prepack
        
            - Runs BEFORE a tarball is packed (on "npm pack", "npm publish", and when installing a git dependencies).
    
    Yeah, no. That's a lie. The prepack script (which would normally be used for triggering a build, e.g. TypeScript compilation) does not run for dependencies pulled directly from git.

    Speaking of TypeScript, the TypeScript compiler developers ran into this very problem, and have adopted this workaround, which is to invoke a script from the npm prepare script, which in turn does some janky checks to guess if the execution is occuring from a source tree fetched from git, and if so, then it explicitly invokes the prepack script, which then kicks off compiler and such. This is the workaround they use today:

    https://github.com/cspotcode/workaround-broken-npm-prepack-b...

    ... and while I'm mentioning bugs, even that has a nasty bug: https://github.com/cspotcode/workaround-broken-npm-prepack-b...

    Yes, if the workaround calls `npm run prepack` and the prepack script fails for some reason (e.g. a compiler error), the exit code is not propagated, so `npm install` will silently install the respective git dependency in a broken state.

    How no one looks at this and comes to the conclusion that NPM is in need of better stewardship, or ought to be entirely supplanted by a competing package manager, I dunno.

  25. Another idea: an indication of how many background jobs are currently running. With many terminal tabs open, I can forget if I already have $EDITOR (or something else) running, so the number of jobs can be a nice cue.

        _jobscount() {
            local jobs=($(jobs -p))
            local count=${#jobs[@]}
            (($count)) && echo -n " (${count}j)"
        }
    
    And then put that in your prompt.
  26. > This is defaulted to toggling on.

    You actually meant to say “this is the option that is given focus when the user is prompted to make a decision of whether to share data or not”, right?

    Because unless they changed the UI again, that’s what happens: you get prompted to make a decision, with the “enable” option given focus. Which means that this is still literally opt-in. It’s an icky, dark pattern (IMO) to give the “enable” option focus when prompted, but that doesn’t make it any less opt-in.

  27. "Synchronized demand is the moment a large cohort of clients acts almost together."

    Imagine everyone in a particular timezone browsing Amazon as they sit down for their 9 to 5; or an outage occurring, and a number of automated systems (re)trying requests just as the service comes back up. These clients are all "acting almost together".

    "In a service with capacity mu requests per second and background load lambda_0, the usable headroom is H = mu - lambda_0 > 0"

    Subtract the typical, baseline load (lambda_0) from the max capacity (mu), and that gives you how much headroom (H) you have.

    The signal processing definition of headroom: the "space" between the normal operating level of a signal and the point at which the system can no longer handle it without distortion or clipping.

    So headroom here can be thought of "wiggle room", if that is a more intuitive term to you.

  28. > Also, that gitignore doesn't even work.

    It's not terribly far off. This works:

      diff --git a/.gitignore b/.gitignore.new
      index 7dc7aea..fc9ebfe 100644
      --- a/.gitignore
      +++ b/.gitignore.new
      @@ -3,17 +3,21 @@
       !.gitignore
       
       # whitelist `src` directories and their children, regardless of place
      +!src
       !src/**/
       !src/**/*.rs
       !Cargo.{toml,lock}
       
       # whitelist root `pysrc` directory
      +!/pysrc
       !/pysrc/*.py
       !pyproject.toml
       !poetry.lock
       
      +!/cmd
       !/cmd/*.go
       !main.go
       !go.{mod,sum}
       
      +!/docs
       !/docs/*.md
    
    This is how I've been managing my dotfiles for over a decade (https://github.com/cstrahan/dotfiles/blob/master/.gitignore).

    This is the email thread I started back on 2016/03/3 to pin down the gitignore behavior that we now have today (there was a regression between git version 2.6.0 and 2.7.0):

    https://lore.kernel.org/git/1457057516.1962831.539160698.3C8...

    (For posterity: the subject line was "Change in .gitignore handling: intended or bug?")

  29. > if anything, meds for high blood pressure come with negative effects on that

    It’s really a mixed bag.

    Recall that Sildenafil (aka brand name Viagra) was originally developed to treat high blood pressure. Turns out that while it does lower blood pressure, it’s really good for improving erections.

  30. Given that a lethal dose of fentanyl is about two milligrams (similar to 5-7 grains of table salt in volume), that still feels pretty risky.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal