Rinse and repeat for many "one-off" tasks.
It's not going away, you need to learn how to use it. shrugs shoulders
I work as the non-software kind of engineer at an industrial plant there is starting to emerge a trend of people who just blindly trust the output of AI chat sessions without understanding what the chat bot is echoing at them which is wasteful of their time and in some cases my time.
This not not new in the past I have experienced engineers who use (abuse) statistics/regression tools etc. Without understanding what the output was telling them but it is getting worse now.
It is not uncommon to hear something like: "Oh I investigated that problem and this particular issue we experienced was because of reasons x, y and z."
Then when you push back because what they've said sounds highly unlikely it boils down to. "I don't know that is what the AI told me".
Then if they are sufficiently optimistic they'll go back and prompt it with "please supply evidence for your conclusion" or some similar prompt and it will supply paragraphs of plausible sounding text but when you dig into what it is saying there are inconsistencies or made up citations. I've seen it say things that were straight up incorrect and went against Laws of Thermodynamics for example.
It has become the new "I threw the kitchen sink into a multivariate regression and X emerged as significant - therefore we should address x"
I'm not a complete skeptic I think AI has some value, for example if you use it as a more powerful search engine by asking it something like "What are some suggested techniques for investigating x" or "What are the limitations of Method Y" etc. It can point you to the right place assist you with research, it might find papers from other fields or similar. But it is not something you should be relying on to do all of the research for you.
The lesson to learn is that these are "large-language models." That means it can regurgitate what someone else has done before textually, but not actually create something novel. So it's fine if someone on the internet has posted or talked about a quick UI in whatever particular toolkit you're using to analyze data. But it'll throw out BS if you ask for something brand new. I suspect a lot of AI users are web developers who write a lot of repetitive rote boilerplate, and that's the kind of thing these LLMs really thrive with.
You get the AI to generate code that lets you spot-check individual data points :-)
Most of my work these days is in fact that kind of code. I'm working on something research-y that requires a lot of visualization, and at this point I've actually produced more throwaway code than code in the project.
Here's an example: I had ChatGPT generate some relatively straightforward but cumbersome geometric code. Saved me 30 - 60 minutes right there, but to be sure, I had it generate tests, which all passed. Another 30 minutes saved.
I reviewed the code and the tests and felt it needed more edge cases, which I added manually. However, these started failing and it was really cumbersome to make sense of a bunch of coordinates in arrays.
So I had it generate code to visualize my test cases! That instantly showed me that some assertions in my manually added edge cases were incorrect, which became a quick fix.
The answer to "how do you trust AI" is human in the loop... AND MOAR AI!!! ;-)
There’s so much evidence out there of people getting real value from the tools.
Some questions you can ask yourself are “why doesn’t it work for me?” and “what can I do differently?”.
Be curious, not dogmatic. Ignore the hype, find people doing real work.
You know where this is going. I asked Claude if audio plugins were well represented in its training data, it said yes, off I went. I can’t review the code because I lack the expertise. It’s all C++ with a lot of math and the only math I’ve needed since college is addition and calculating percentages. However, I can have intelligent discussions about design and architecture and music UX. That’s been enough to get me a functional plugin that already does more in some respects than the original. I am (we are?) making it steadily more performant. It has only crashed twice and each time I just pasted the dump into Claude and it fixed the root cause.
Long story short: if you can verify the outcome, do you need to review the code? It helps that no one dies or gets underpaid if my audio plugin crashes. But still, you can’t tell me this isn’t remarkable. I think it’s clear there will be a massive proliferation of niche software.
In other words you can’t vibe code in an environment where evaluating “does this code work” is an existential question. This is the case where 7k LOC/day becomes terrifying.
Until we get much better at automatically proving correctness of programs we will need review.
This is the game changer for me: I don’t have to evaluate tens or hundreds of market options that fit my problem. I tell the machine to solve it, and if it works, then I’m happy. If it doesn’t I throw it away. All in a few minutes and for a few cents. Code is going the way of the disposable diaper, and, if you ever washed a cloth diaper you will know, that’s a good thing.
What happens when it seems to work, and you walk away happy, but discover three months later that your circular components don't line up because the LLM-written CAD software used an over-rounded PI = 3.14? I don't work in industrial design, but I faced a somewhat similar issue where an LLM-written component looked fine to everyone until final integration forced us to rewrite it almost entirely.
The original code "looks" fine, and it works pretty well even, but an LLM cannot avoid critical oversights along the way, and is fundamentally designed to its mistakes look as plausibly correct as possible. This makes correcting the problems down the line much more annoying (unless you can afford to live with the bugs and keep slapping on more band aids, i guess)
At one point you might take over, ask it for specific refactors you'd do but are too lazy to do yourself. Or even toss it away entirely and start fresh with better understanding. Yourself or again with agent.
I think the throwaway part is important here and people are missing it, particularly for non-programmers.
There's a lot of roles in the business world that would make great use of ephemeral little apps like this to do a specific task, then throw it away. Usually just running locally on someone's machine, or at most shared with a couple other folks in your department.
Code doesn't have to be good, hell it doesn't even have to be secure, and certainly doesn't need to look pretty. It just needs to work.
There's not enough engineering staff or time to turn every manager's pet excel sheet project into a temporary app, so LLMs make perfect sense here.
I'd go as far to say more effort should be put into ephemeral apps as a use case for LLMs over focusing on trying to use them in areas where a more permanent, high quality solution is needed.
Improve them for non-developers.
And then people create non-throwaway things with it and your job, performance report, bonus, and healthcare are tied to being compared to those people who just do what management says without arguing about the correct application of the tool.
If you keep your job, it's now tied to maintaining the garbage those coworkers checked in.
If you don't know how to analyze data, and flat out refuse to invest in learning the skill, then I guess that could be really useful. Those users are likely the ones most enthusiastic about AI. But are those users close to as productive as someone who learns a mature tool? Not even close.
Lots of people appreciate an LLM to generate boiler plate code and establish frameworks for their data structures. But that's code that probably shouldn't be there in the first place. Vibe coding a game can be done impressively quick, but have you tried using a game construction kit? That's much faster still.
It's infinitely worse when your PM / manager vibe-codes some disgusting garbage, sees that it kind of looks like a real thing that solves about half of the requirements (badly) and demands engineers ship that and "fix the few remaining bugs later".
There's a shit-ton of bad and inefficient code on the internet. Lots of it. And it was used to train these LLMs as much as the good code.
In other words, the LLMs are great if you're OK with mediocrity at best. Mediocrity is occasionally good enough, but it can spell death for a company when key parts of it are mediocre.
I'm afraid a lot of the executives who fantasize about replacing humans with AI are going to have to learn this the hard way.
And its tricky because I'm trying not to appeal to emotion despite being fascinated with how this tool has enabled me to do things in a short amount of time that it would have taken me weeks of grinding to get to and improves my communication with stakeholders. That feels world changing. Specifically my world and the day-to-day roll I play when it comes to getting things done.
I think it is fine that it fell short of your expectations. It often does for me as well but it's when it gets me 80% of the way there in less than a day's work, then my mind is blown. It's an imperfect tool and I'm sorry for saying this but so are we. Treat its imperfections in the same way you would with a JR developer- feedback, reframing, restrictions, and iterate.
Well… That's no longer true, is it?
My partner (IT analyst) works for a company owned by a multinational big corporation, and she got told during a meeting with her manager that use of AI is going to become mandatory next year. That's going to be a thing across the board.
And have you called a large company for any reason lately? Could be your telco provider, your bank, public transport company, whatever. You call them, because online contact means haggling with an AI chatbot first to finally give up and shunt you over to an actual person who can help, and contact forms and e-mail have been killed off. Calling is not exactly as bad, but step one nowadays is 'please describe what you're calling for', where some LLM will try to parse that, fail miserably, and then shunt you to an actual person.
AI is already unavoidable.
My multinational big corporation employer has reporting about how much each employee uses AI, with a naughty list of employees who aren't meeting their quota of AI usage.
The fact that companies have to force you to use it with quotas and threats is damning.
“Why don’t you just make the minimum 37 pieces of flAIr?”
It's mostly a sign leadership has lost reasoning capability if it's mandatory.
But no, reporting isn't necessarily the problem. There are plenty of places that use reporting to drive a conversation on what's broken, and why it's broken for their workflow, and then use that to drive improvement.
It's only a problem if the leadership stance is "Haha! We found underpants gnome step 2! Make underpants number go up, and we are geniuses". Sadly not as rare as one would hope, but still stupid.
All of this predates LLMs (what “AI” means today) becoming a useful product. All of this happened already with previous generations of “AI”.
It was just even shittier than the version we have today.
This is what I always think of when I imagine how AI will change the world and daily life. Automation doesn't have to be better (for the customer, for the person using it, for society) in order to push out the alternatives. If the automation is cheap enough, it can be worse for everyone, and still change everything. Those are the niches in ehich I'm most certain will be here to stay— because sometimes, it hardly matters if it's any good.
If you're lucky. I've had LLMs that just repeatedly hang up on me when they obviously hit a dead end.
AI's not exactly a step down from that.
I'd argue that's not true. It's more of a stated goal. The actual goal is to achieve the desired outcome in a way that has manageable, understood side effects, and that can be maintained and built upon over time by all capable team members.
The difference between what business folks see as the "output" of software developers (code) and what (good) software developers actually deliver over time is significant. AI can definitely do the former. The latter is less clear. This is one of the fundamental disconnects in discussions about AI in software development.
I'm going to say this next thing as someone with a lot of negative bias about corporations. I was laid off from Twitter when Elon bought the company and at a second company that was hemorrhaging users.
Our job isn't to write code, it's to make the machine do the thing. All the effort for clean, manageable, etc is purely in the interest of the programmer but at the end of the day, launching the feature that pulls in money is the point.
Maybe I'm not understanding you're point, but this is the kind of thing that happens in software teams all the time and is one of those "that's why they call it work" realities of the job.
If something "seems right/passed review/fell apart" then that's the reviewer's fault right? Which happens, all the time! Reviewers tend to fall back to tropes and "is there tests ok great" and whatever their hobbyhorses tend to be, ignoring others. It's ok because "at least it's getting reviewed" and the sausage gets made.
If AI slashed the amount of time to get a solution past review, it buys you time to retroactively fix too, and a good attitude when you tell it that PR 1234 is why we're in this mess.
If everyone on your team is doing that, it's not long before huge chunks of your codebase are conceptually like stuff that was written a long time ago by people who left the company. Except those people may have actually known what they were doing. The AI chatbots are generating stuff that seems to plausibly work well enough based on however they were prompted.
There are intangible parts of software development that are difficult to measure but incredibly valuable beyond the code itself.
> Our job isn't to write code, it's to make the machine do the thing. All the effort for clean, manageable, etc is purely in the interest of the programmer but at the end of the day, launching the feature that pulls in money is the point.
This could be the vibe coder mantra. And it's true on day one. Once you've got reasonably complex software being maintained by one or more teams of developers who all need to be able to fix bugs and add features without breaking things, it's not quite as simple as "make the machine do the thing."
I mean this in sincerity, and not at all snarky, but - have you considered that you haven't used the tools correctly or effectively? I find that I can get what I need from chatbots (and refuse to call them AI until we have general AI just to be contrary) if I spend a couple of minutes considering constraints and being careful with my prompt language.
When I've come across people in my real life who say they get no value from chatbots, it's because they're asking poorly formed questions, or haven't thought through the problem entirely. Working with chatbots is like working with a very bright lab puppy. They're willing to do whatever you want, but they'll definitely piss on the floor unless you tell them not to.
Or am I entirely off base with your experience?
I prefer to use LLM as a sock puppet to filter out implausible options in my problem space and to help me recall how to do boilerplate things. Like you, I think, I also tend to write multi-paragraph prompts repeating myself and calling back on every aspect to continuously hone in on the true subject I am interested in.
I don't trust LLM's enough to operate on my behalf agentically yet. And, LLM is uncreative and hallucinatory as heck whenever it strays into novel territory, which makes it a dangerous tool.
The problem is that this comes off just as tone-deaf as "you're holding it wrong." In my experience, when people promote AI, its sold as just having a regular conversation and then the AI does thing. And when that doesn't work, the promoter goes into system prompts, MCP, agent files, etc and entire workflows that are required to get it to do the correct thing. It ends up feeling like you're being lied to, even if there's some benefit out there.
There's also the fact that all programming workflows are not the same. I've found some areas where AI works well, but a lot of my work it does not. Usually things that wouldn't show up in a simple Google search back before it was enshittified are pretty spotty.
Then there’s people like me, who you’d probably term as an old soul, who looks at all that and says, “I have to change my workflow, my environment, and babysit it? It is faster to simply just do the work.” My relationship with tech is I like using as little as possible, and what I use needs to be predictable and do something for me. AI doesn’t always work for me.
This is almost the complete opposite of my experience. I hear expressions about improvements and optimism for the future, but almost all of the discussion from active people productivly using AI is about identifying the limits and seeing what benefits you can find within those limits.
They are not useless and they are also not a panacea. It feels like a lot of people consider those the only available options.
It can't reason from first principles and there isn't training data for a lot of state-of-the-art computer science and code implementations. Nothing you can prompt will make it produce non-naive output because it doesn't have that capability.
AI works for a lot of things because, if we are honest, AI generated slop is replacing human generated slop. But not all software is slop and there are software domains where slop is not even an option.
I think I have a good idea how these things work. I have run local LLMs for a couple of years on a pair of video cards here, trying out many open weight models. I have watched the 3blue1brown ML course. I have done several LinkedIn Learning courses (which weren't that helpful, just mandatory). I understand about prompting precisely and personas (though I am not sold personas are a good idea). I understand LLMs do not "know" anything, they just generate the next most likely token. I understand LLMs are not a database with accurate retrieval. I understand "reasoning" is not actual thinking just manipulating tokens to steer a conversation in vector space. I understand LLMs are better for some tasks (summarisation, sentiment analysis, etc) than others (retrieval, math, etc). I understand they can only predict what's in their training data. I feel I have a pretty good understanding of how to get results from LLMs (or at least the ways people say you can get results).
I have had some small success with LLMs. They are reasonably good at generating sub-100 line test code when given a precise prompt, probably because that is in training data scraped from StackOverflow. I did a certification earlier this year and threw ~1000 lines of Markdown notes into Gemini and had it quiz me which was very useful revision, it only got one question wrong of the couple of hundred I had it ask me.
I'll give a specific example of a recent failure. My job is mostly troubleshooting and reading code, all of which is public open source (so accessible via LLM search tooling). I was trying to understand something where I didn't know the answer, and this was difficult code to me so I was really not confident at all in my understanding. I wrote up my thoughts with references, the normal person I ask was busy so I asked Gemini Pro. It confidently told me "yep you got it!".
I asked someone else who saw a (now obvious) flaw in my reasoning. At some point I'd switched from a hash algorithm which generates Thing A, to a hash algorithm which generates Thing B. The error was clearly visible, one of my references had "Thing B" in the commit message title, which was in my notes with the public URL, when my whole argument was about "Thing A".
This wasn't even a technical or code error, it was a text analysis and pattern matching error, which I didn't see because I was so focused on algorithms. Even Gemini, the apparent best LLM in the world which is causing "code red" at OpenAI did not pick this up, when text analysis is supposed to be one of its core functionalities.
I also have a lot of LLM-generated summarisation forced on me at work, and it's often so bad I now don't even read it. I've seen it generate text which makes no logical sense and/or which uses so many words without really saying anything at all.
I have tried LLM-based products where someone else is supposed to have done all the prompt crafting and added RAG embeddings and I can just behave like a naive user asking questions. Even when I ask these things question which I know are in the RAG, they cannot retrieve an accurate answer ~80% of the time. I have read papers which support the idea that most RAG falls apart after about ~40k words and our document set is much larger than that.
Generally I find LLMs are at the point where to evaluate the LLM response I need to either know the answer beforehand so it was pointless to ask, or I need to do all the work myself to verify the answer which doesn't improve my productivity at all.
About the only thing I find consistently useful about LLMs is writing my question down and not actually asking it, which is a form of Rubber Duck Debugging (https://en.wikipedia.org/wiki/Rubber_duck_debugging) which I have already practiced for many years because it's so helpful.
Meanwhile trillions of dollars of VC-backed marketing assures me that these things are a huge productivity increaser and will usher in 25% unemployment because they are so good at doing every task even very smart people can do. I just don't see it.
If you have any suggestions for me I will be very willing to look into them and try them.
I find people mostly prefer what they are used to, and if your preference was so superior then how could so many people build fantastic software using the method you don't like?
AI isn't like that. AI is a bunch of people telling me this product can do wonderful things that will change society and replace workers, yet almost every time I use it, it falls far short of that promise. AI is certainly not reliable enough for me to jeopardize the quality of my work by using it heavily.