Comment by eitally - Hacker Neue

eitally Jun 19, 2025 parent

It's going to be very interesting to see how things evolve in enterprise IT, especially but not exclusively in regulated industries. As more SaaS services are at least partly vibe coded, how are CIOs going to understand and mitigate risk? As more internal developers are using LLM-powered coding interfaces and become less clear on exactly how their resulting code works, how will that codebase be maintained and incrementally updated with new features, especially in solo dev teams (which is common)?

I easily see a huge future for agentic assistance in the enterprise, but I struggle mightily to see how many IT leaders would accept the output code of something like a menugen app as production-viable.

Additionally, if you're licensing code from external vendors who've built their own products at least partly through LLM-driven superpowers, how do you have faith that they know how things work and won't inadvertently break something they don't know how to fix? This goes for niche tools (like Clerk, or Polar.sh or similar) as much as for big heavy things (like a CRM or ERP).

I was on the CEO track about ten years ago and left it for a new career in big tech, and I don't envy the folks currently trying to figure out the future of safe, secure IT in the enterprise.

charlie0 Jun 19, 2025

It will succeed due to the same reason other sloppy strategies succeed, it has large short term gains and moves risk into the nebulous future. Management LOVES these types of things.

r2b2 Jun 19, 2025

I've found that as LLMs improve, some of their bugs become increasingly slippery - I think of it as the uncanny valley of code.

Put another way, when I cause bugs, they are often glaring (more typos, fewer logic mistakes). Plus, as the author it's often straightforward to debug since you already have a deep sense for how the code works - you lived through it.

So far, using LLMs has downgraded my productivity. The bugs LLMs introduce are often subtle logical errors, yet "working" code. These errors are especially hard to debug when you didn't write the code yourself — now you have to learn the code as if you wrote it anyway.

I also find it more stressful deploying LLM code. I know in my bones how carefully I write code, due to a decade of roughly "one non critical bug per 10k lines" that keeps me asleep at night. The quality of LLM code can be quite chaotic.

That said, I'm not holding my breath. I expect this to all flip someday, with an LLM becoming a better and more stable coder than I am, so I guess I will keep working with them to make sure I'm proficient when that day comes.

thegeomaster Jun 19, 2025

I have been using LLMs for coding a lot during the past year, and I've been writing down my observations by task. I have a lot of tasks where my first entry is thoroughly impressed by how e.g. Claude helped me with a task, and then the second entry is a few days after when I'm thoroughly irritated by chasing down subtle and just _strange_ bugs it introduced along the way. As a rule, these are incredibly hard to find and tedious to debug, because they lurk in the weirdest places, and the root cause is usually some weird confabulation that a human brain would never concoct.

throw234234234 5 days ago

Saw a recent talk where someone described AI as making errors, but not errors that a human would naturally make and are usually "plausible but wrong" answers. i.e. the errors that these AI's make are of a different nature than what a human would do. This is the danger - that reviews now are harder; I can't trust it as much as a person coding at present. The agent tools are a little better (Claude Code, Aider, etc) in that they can at least take build and test output but even then I've noticed it does things that are wrong but are "plausible and build fine".

I've noticed it in my day-to-day: an AI PR review is different than if I get the PR from a co-worker with different kinds of problems. Unfortunately the AI issues seem to be more of the subtle kind - the things if I'm not diligent could sneak into production code. It means reviews are more important, and I can't rely on previous experience of a co-worker and the typical quality of their PR's - every new PR is a different worker effectively.

DanHulton Jun 19, 2025

I'm curious where that expectation of the flip comes from? Your experience (and mine, frankly) would seem to indicate the opposite, so from whence comes this certainty that one day it'll change entirely and become reliable instead?

I ask (and I'll keep asking) because it really seems like the prevailing narrative is that these tools have improved substantially in a short period of time, and that is seemingly enough justification to claim that they will continue to improve until perfection because...? waves hands vaguely

Nobody ever seems to have any good justification for how we're going to overcome the fundamental issues with this tech, just a belief that comes from SOMEWHERE that it'll happen anyway, and I'm very curious to drill down into that belief and see if it comes from somewhere concrete or it's just something that gets said enough that it "becomes true", regardless of reality.

dapperdrake Jun 19, 2025

Just like when all regulated industries started only using decision trees and ordinary least-squares regression instead of any other models.

gosub100 Jun 19, 2025

> how many IT leaders would accept the output code of something like a menugen app as production-viable.

probably all of the ones at microsoft

This item has no comments currently.