You want proof for critical/supportive criticism? Then almost in the same sentence you make an insane claim without backing this up by any evidence.
Nearly every critical reply to my comment bases that criticism on the lack of examples and details I included for my claim which is the very thing I am suggesting we do (i.e. they are, ironically, agreeing with me?). I am sorry I thought that intentional bit of irony would help make the point rather than derail the request.
Here are a few projects that I made these past few months that wouldn't have been possible without LLMs:
* https://github.com/skorokithakis/dracula - A simple blood test viewer.
* https://www.askhuxley.com - A general helper/secretary/agent.
* https://www.writelucid.cc - A business document/spec writing tool I'm working on, it asks you questions one at a time, writes a document, then critiques the idea to help you strengthen it.
* A rotary phone that's a USB headset and closes your meeting when you hang up the phone, complete with the rotary dial actually typing in numbers.
* Made some long-overdue updates on my pastebin, https://www.pastery.net, to improve general functionality.
* https://github.com/skorokithakis/support-email-bot - A customer support bot to answer general questions about my projects to save me time on the easy stuff, works great.
* https://github.com/skorokithakis/justone - A static HTML page for the board game Just One, so you can play with your friends when you're physically together, without needing to bring the game along.
* https://github.com/skorokithakis/dox - A thing to run Dockerized CLI programs as if they weren't Dockerized.
I'm probably forgetting a lot more, but I honestly wouldn't have been bothered to start any of the above if not for LLMs, as I'm too old to code but not too old to make stuff.
EDIT: dang can we please get a bit better Markdown support? At least being able to make lists would be good!
LLMs are a great rubber duck, plus they can write the document for you at the end.
Although I was just commenting on the irony of the parent comment.
hn has no markdown support at all right now. It's just this https://news.ycombinator.com/formatdoc
Great use case for an LLM to make these changes as HN is open source. It’ll also tell us if LLMs can go beyond JS slop.
1 is not infinitely greater than 0.
Or otherwise, can you share what you think the ratio is?
> I am still surprised at things it cannot do, for example Claude code could not seem to stitch together three screens in an iOS app using the latest SwiftUI (I am not an iOS dev).
You made a critical comment yet didn't follow your own rules lol.
> it's so helpful for meaningful conversation!
How so?
FWIW - I too have used LLMs for both coding and personal prompting. I think the general conclusion is that it when it works, it works well but when it fails it can fail miserably and be disastrous. I've come to conclusion because I read people complaining here and through my own experience.
Here's the problem:
- It's not valuable for me to print out my whole prompt sequence (and context for that matter) in a message board. The effort is boundless and the return is minimal.
- LLMs should just work(TM). The fact that they can fail so spectacularly is a glaring issue. These aren't just bugs, they are foundational because LLMs by their nature are probabilistic and not deterministic. Which means providing specific defect criteria has limited value.
Sure. Another article was posted today[1] on the subject. An example claim:
> If we asked the AI to solve a task that was already partially solved, it would just replicate code all over the project. We’d end up with three different card components. Yes, this is where reviews are important, but it’s very tiring to tell the AI for the nth time that we already have a Text component with defined sizes and colors. Adding this information to the guidelines didn’t work BTW.
This is helpful framing. I would say to this: I have also noticed this pattern. I have seen two approaches help. One, I break up UI / backend tasks. At the end of UI tasks, and sometimes before I even look at the code, I say: "Have you reviewed your code against the existing components library <link to doc>?" and sometimes "Have you reviewed the written code compared to existing patterns and can you identify opportunities for abstraction?" (I use plan mode for the latter, and review what it says). The other approach which I have seen others try, but have not myself (but it makes sense), is to automatically do this with a sub agent or hook. At a high level it seems like a good approach given I am manually doing the same thing now.
Roughly speaking that is how I think through my work, and when I get to the point of actually writing the code having most of the plan (context) in my head, I simply copy that context to the LLM then go to do something else. I only do this if I believe the LLM can do it effectively, so some tasks I do not ask for help at all on (IMHO this is important).
I also have it help with scripts, especially script that munge and summarize data. I know SQL very very well, but find it still a bit faster to prompt the LLM if it has the schema on hand.
Do you find ^ helpful? i.e does that match how you prompt and if not, in what ways does it differ? If it does, in what ways do you get different results and at what step?
This was in stark contrast to my experience with TypeScript/NextJS, Python, and C#. Most of the time output quality for these was at least usefully good. Occasionally you’d get stuck in a tarpit of bullshit/hallucination around anything very new that hadn’t been in the training dataset for the model release you were using.
My take: there simply isn’t the community, thought leadership, and sheer volume of content around Swift that there is around these other languages. This means both lower quantity and lower quality of training data for Swift as compared to these other languages.
And that, unfortunately, plays negatively into the quality of LLM output for app development in Swift.
(Anyone who knows better, feel free to shoot me down.)
Another issue is that Apple developer docs are largely sequestered behind JavaScript that makes them hard for scrapers to parse.
At least, those are the two explanations I’ve seen that seem plausible.
> One is that Swift has changed massively since it came out and huge swathes of examples and articles and such online, that LLMs are trained on, are out of date and thus pollute the training set.
100% jibes with my experience. The amount of times it would generate code using a deprecated API, or some older mechanism, or mix an older idiom with a newer one... well, it was constant really.
And a lot of Googling when I was fixing everything up manually drew me toward this same conclusion: that high quality, up to date information on Swift was in relatively short supply compared to other languages. Couple that with a lower volume of content across all Swift versions and you end up with far from great training data leading to far from great outputs.
> Apple developer docs are largely sequestered behind JavaScript that makes them hard for scrapers to parse.
Yeah, and honestly - even if there's a solution here - the documentation isn't that great either. Certainly not compared with .NET, Ruby, Python, TypeScript, etc.
If I were a vibe coder I'd certainly avoid Swift like the plague.
(Btw, this isn't a knock on Swift itself: as a language I didn't mind it, although I did notice when debugging that the Objective C underpinnings of many APIs are often on display.)
For a bunch of reasons I want to avoid the standard React, Typescript, and Node stack but the sheer velocity that might enable from the LLM side might make it worth it.
Are you saying that your experience with Go has been bad? I would think Go would be as good as any other language (if not better). The language itself is simple, the Go team is very methodical about adding new features so it changes fairly slowly, it has excellent built in CLI based tooling that doesn't require third party packages or applications, and there are plenty of large open source Go codebases to train on. Seems like the perfect language for agentic tools.
Is it number of lines? Tickets closed? PRs opened or merged? Number of happy customers?
Or do he now just just get to work for 2 hours and enjoy the remaining 6 hours doing meaningful things apart from staring at a screen.
However I don’t have lottery millions, but I do have a job and I would like to be able to do it better.
Is that helpful?
Have you heard of that study that shows AI actually makes developers less productive, but they think it makes them more productive??
EDIT: sorry all, I was being sarcastic in the above, which isn't ideal. Just annoyed because that "study" was catnip to people who already hated AI, and they (over-) cite it constantly as "evidence" supporting their preexisting bias against AI.
Have you looked into that study? There's a lot wrong with it, and it's been discussed ad nauseam.
Also, what a great catch 22, where we can't trust our own experiences! In fact, I just did a study and my findings are that everyone would be happier if they each sent me $100. What's crazy is that those who thought it wouldn't make them happier, did in fact end up happier, so ignore those naysayers!
Also, please stop posting flamebait to HN generally. It's not what this site is for, and destroys what it is for.
Hey, so if I DO see it, can I stop it from happening?
Of the two of you, I know which one I'd bet on being "right". (Hint: It's the one talking about their own experience, not the one supplanting theirs onto someone else)
We birthed a level of cognition out of silicon that nobody would imagine even just four years ago. Sorry, but some brogrammers being worried about making ends meet is making me laugh - it's all the same people who have been automating everyone else's jobs for the past two decades (and getting paid extremely fat salaries for it), and you're telling me now we're all supposed to be worried because it's going to affect our salaries?
Come on. You think everyone who's "vibe coding" doesn't understand the pointlessness of 90% of codemonkey work? Hell, most smart engineers understood that pointlessness years ago. Most coders work on boring CRUD apps and REST APIs to make revenue go up 0.02%. And those that aren't, are probably working on ads.
It's a fraction of a fraction that is at all working on interesting things.
Personally, yeah, I saw it coming and instead of "accepting fate", I created an AI research lab. And I diversified the hell out of my skillset as well - started working way out of my comfort zone. If you want to keep up with changing times, start challenging.
Most of the anti-AI comments I see on HN are NOT a version of "the problem with AI is that it's so good it's going to replace me!"
What "discussion" do you want to have? Another round of "LLMs are terrible at embedded hardware programming ergo they're useless"? Maybe with a dash of "LLMs don't write bug-free software [but I do]" to close it off?
The discussions that are at all advancing the state of the art are happening on forums that accept reality as a matter of fact, without people constantly trying to constantly pretend things because they're worried they'll lose their job if they don't.
have you tried in the new xcode extension? that tool is surprisingly good in my limited use. one of the few times xcode has impressed me in my 2 yeasrs of use. read some anecdotes that claude in the xcode tool is more accurate than standard claude code for Swift. i havent noticed that myself but only used the xcode tool twice so far
Can't show prompts and actual, real work, because, well, it's confidential, and I'd like to get a paycheck instead of a court summons sometime in the next two weeks.
Generally, 'I can't show you the details of my work' isn't a barrier in communicating about tech, because you can generalize and strip out the proprietary bits, but because LLM behavior is incredibly idiosyncratic, by the time you do that, you're no longer accurately communicating the problem that you're having.
> What is the idiom for testing the launch screen on the simulator like.. I don't see anything? How do I know if its there.
i.e. in iOS / Swift, I don't even know if I'm using the right terms for the code I am trying to interrogate, or in some cases even what the thing is!
But for stuff like TCA (Swift composable architecture), I basically created a TCA.md file and pasted in a bunch of docs and examples and would reference that.
But for the most part, it was one shotting swiftui screens that were nicer than what I had in my mind.
https://www.theverge.com/ai-artificial-intelligence/787524/a...
Yeah, maybe it is garbage. But it is still another milestone, if it can do this, then it probably does ok with the smaller things.
This keeps incrementing from "garbage" to "wow this is amazing" at each new level. We're already forgetting that this was unbelievable magic a couple years ago.
That's... not super surprising? SwiftUI changes pretty dang often, and the knowledge cutoff doesn't progress fast enough to cover every use-case.
I use Claude to write GTK interfaces, which is a UI library with a much slower update cadence. LLMs seem to have a pretty easy time working with bog-standard libraries that don't make giant idiomatic changes.
What are the specific tasks + prompts giving you an 3x increased output, and conversely, what tasks don't work at all?
After an admittedly cursory scan of your blog and the repos in your GH account I don't find anything in this direction.
- "Rails / sidekiq: <x file> uses sidekiq batches. <y file> does it. Refactor your to use pattern in <x file> Match spec in <z file> then run rspec and rubocop"
- "Typescript / react. <x file>. Why is typescript compilation a bottle neck int his file. Use debugger to provide definitive evidence. Cast type to any and run script and time it; write a script to measure timing if needed. Iteratively work from type `any` to a real type and measure timing at each step. Summarize results"
- "I redefine <FormComponent> in five places. Find them all. Identify the shared patterns. Make new component in <x location>. Refactor each to to use new component. Run yarn lint and fix any ts issues when done"
- "<file y>: more idiomatic" (it knows my preferences)
Side projects and such I have no idea, and (as you noted) I do those quite infrequently anyways! Actually come to think of it... outside of the toy iOS work I did last week, I've not actually worked on my side projects since getting into Claude code / cursor agents. For work stuff, I guess other metrics I'd be interested in are total messages sent per task. I do sometimes look at $ per task (but for me anyways, that's so wildly in my favor I don't think it's worth it".I don't count the things I'm doing now that I would have avoided or never finished in the past. For those, of course to me personally those are worth much more psychologically than 3x, but who knows if it's an actual boost. I.e. I took a partially scripted task the other day and fully automated it, and also had it output to the CLI in a kind of dorky sci-fi way because it makes it fun to run it. It didn't take long - 30 minutes? But I certainly didn't _gain_ time doing that, just a little more satisfaction. TBH I'm surprised 3x is so controversial, I thought it was a really cool and far more practical assessment than some of these 10x claims I'm seeing.
I can only list my open source outputs concretely for obvious reasons but https://github.com/rubberduckmaths/reddit_terraforming_mars_... was a near one shot. It's a Reddit bot that posts card text to the Terraforming Mars subreddit when asked which is helpful for context on discussions of that board game. Appreciated and used a lot by the community there. There's a similar project i used AI for to scrape card text that was also near one shot. I'd say for these two hobby projects 50x productivity is a reasonable statement. I wrote Reddit bots ~10 years ago without coding assistance - https://github.com/AReallyGoodName/xwingminibot i get to reasonably absolutely compare two very similar projects. I think it's totally fair for me to say 50x for this example. The Reddit API even changed completely in that time so no one can really say "you used past experience to move faster, it's not the ai giving a 50x boost" but I really didn't. My memory is not that good except for memory of an entire weekend previously vs <30mins total now using a bot to one shot some pretty cool projects.
As for the negatives they are never serious. A couple of good examples;
"Please correct all lint errors in this project" only to have @lintignore added to all files. Lol! Obviously i just more clearly specified the prompt and it's not like it's hard to catch these things and not ship to prod. It was funny to everyone i showed and no big deal.
Another similar case, "please make the logging of this file less verbose especially around the tight loop on line X". Instead of changing log level or removing some of the log statements the ai redirected stdout at the initialization of the command line program (would completely break it of course). Again hilarious but also not big deal. Not even much of a waste of time since you just change the prompt and run again and honestly a few silly diversions like this now and then is kind of fun. As in the comments of "OMG AI sometimes gets it wrong" aren't at all serious. I have version control, i review code. No big deal.
I too eye roll massively at some of the criticisms at this point. It's like people are stretching to claim everyone who's using a coding assistant is newb who's throwing everything into prod and deleting databases etc. That's just not reality.