Comment by peacebeard

peacebeard Jun 18, 2025 parent

Very common to see in comments some people saying “it can’t do that” and others saying “here is how I make it work.” Maybe there is a knack to it, sure, but I’m inclined to say the difference between the problems people are trying to use it on may explain a lot of the difference as well. People are not usually being too specific about what they were trying to do. The same goes for a lot of programming discussion of course.

alganet Jun 18, 2025

> People are not usually being too specific about what they were trying to do. The same goes for a lot of programming discussion of course.

In programming, I already have a very good tool to follow specific steps: _the programming language_. It is designed to run algorithms. If I need to be specific, that's the tool to use. It does exactly what I ask it to do. When it fails, it's my fault.

Some humans require algorithmic-like instructions too. Like cooking a recipe. However, those instructions can be very vague and a lot of humans can still follow it.

LLMs stand on this weird place where we don't have a clue in which occasions we can be vague or not. Sometimes you can be vague, sometimes you can't. Sometimes high level steps are enough, sometimes you need fine-grained instructions. It's basically trial and error.

Can you really blame someone for not being specific enough in a system that only provides you with a text box that offers anthropomorphic conversation? I'd say no, you can't.

If you want to talk about how specific you need to prompt an LLM, there must be a well-defined treshold. The other option is "whatever you can expect from a human".

Most discussions seem to juggle between those two. LLMs are praised when they accept vague instructions, but the user is blamed when they fail. Very convenient.

peacebeard OP Jun 18, 2025

I am not saying that people were not specific in their instructions to the LLM, but rather that in the discussion they are not sharing specific details of their success stories or failures. We are left seeing lots of people saying "it worked for me" and "it didn't work for me" without enough information to assess what was different in those cases. What I'm contending is that the essential differences in the challenges they are facing may be a primary factor, while these discussions tend to focus on the capabilities of the LLM and the user.

alganet Jun 18, 2025

> they are not sharing specific details of their success stories or failures

Can you blame them for that?

For other products, do you think people contact customer support with an abundance of information?

Now, consider what these LLM products promise to deliver. Text box, answer. Is there any indication that different challenges might yield difference in the quality of outcome? Nope. Magic genie interface, it either works or it doesn't.

heyitsguay Jun 18, 2025

I've noticed this a lot, too, in HN LLM discourse.

(Context: Working in applied AI R&D for 10 years, daily user of Claude for boilerplate coding stuff and as an HTML coding assistant)

Lots of "with some tweaks i got it to work" or "we're using an agent at my company", rarely details about what's working or why, or what these production-grade agents are doing.

Aeolun Jun 18, 2025

I ask it to build it to 3d voxel engine in Rust, and it just goes off and do it. Same for a vox file parser.

Sure, it takes some creative prompting, and a lot of turns to get it to settle on the proper coordinate system for the whole thing, but it goes ahead and does it.

This took me two days so far. Unfortunate, the scope of the thing is now so large that the quality rapidly starts to degrade.

SchemaLoad Jun 18, 2025

Building something from scratch where there are plenty of examples public on github seems to be the easiest case. Put these agents on a real existing codebase and ask them to fix a bug and they become useless.

twosdai Jun 19, 2025

I think this would vary a lot between "real" code basis. I have had a lot of success when using somewhat stricter frameworks, with typed interfaces, and requiring well defined unit tests, and modules which ecapsulate a lot of logic.

Basically like Java Spring Boot or NestJS type projects.

Aeolun Jun 19, 2025

I like that skeptical people always first ask for examples, then when someone gives them, they immediately switch to “well, sure, but that one is easy!”

This item has no comments currently.