In programming, I already have a very good tool to follow specific steps: _the programming language_. It is designed to run algorithms. If I need to be specific, that's the tool to use. It does exactly what I ask it to do. When it fails, it's my fault.
Some humans require algorithmic-like instructions too. Like cooking a recipe. However, those instructions can be very vague and a lot of humans can still follow it.
LLMs stand on this weird place where we don't have a clue in which occasions we can be vague or not. Sometimes you can be vague, sometimes you can't. Sometimes high level steps are enough, sometimes you need fine-grained instructions. It's basically trial and error.
Can you really blame someone for not being specific enough in a system that only provides you with a text box that offers anthropomorphic conversation? I'd say no, you can't.
If you want to talk about how specific you need to prompt an LLM, there must be a well-defined treshold. The other option is "whatever you can expect from a human".
Most discussions seem to juggle between those two. LLMs are praised when they accept vague instructions, but the user is blamed when they fail. Very convenient.
Can you blame them for that?
For other products, do you think people contact customer support with an abundance of information?
Now, consider what these LLM products promise to deliver. Text box, answer. Is there any indication that different challenges might yield difference in the quality of outcome? Nope. Magic genie interface, it either works or it doesn't.
(Context: Working in applied AI R&D for 10 years, daily user of Claude for boilerplate coding stuff and as an HTML coding assistant)
Lots of "with some tweaks i got it to work" or "we're using an agent at my company", rarely details about what's working or why, or what these production-grade agents are doing.
Sure, it takes some creative prompting, and a lot of turns to get it to settle on the proper coordinate system for the whole thing, but it goes ahead and does it.
This took me two days so far. Unfortunate, the scope of the thing is now so large that the quality rapidly starts to degrade.
Basically like Java Spring Boot or NestJS type projects.