Preferences

taormina
Joined 683 karma
https://taormina.io

  1. I can’t go a full a full conversation without obviously false claims. They will insist you are correct and that your correction is completely correct despite that also being wrong.
  2. Tell Uber you said hi.
  3. For a bit of shameless plug, we're actively building a mobile and desktop game that avoids these patterns.

    You can learn more about Danger World at https://danger.world

  4. See, humans respond very differently when that happens. The failure to do what humans do when they don’t understand something or know something is frequently what fails LLMs at the TT.
  5. I haven't tried that, it might be worth a shot.
  6. Nah, it's definitely not that. I have it explicitly several times over in that file and it insists on commenting so much it's absurd.
  7. Taxi-by-app existed pre-Uber, the innovation was making the taxi actual show up. Austin had the apps, and I would order one, and the taxis would get distracted on the way to my house and pick up other fares, so I couldn't go where I wanted. They had every chance to not be outcompeted by Uber, but they couldn't stop being taxis. And, here we are.
  8. I’ve found this all to be just as straightforward as implementing the MAB in my experience.
  9. I'm still working on Danger World (https://danger.world), my casual 2D narrative adventure with turn-based RPG elements. Built in Flame, on top of Flutter for iOS, Android, Windows and MacOS.

    We're getting close! It's just a matter of polishing and polishing and polishing, but I'm really excited about how close we are to launch.

  10. IANAL, but how is this considered legal? I could see Facebook being allowed to claw back up to the amount of the severance, but $50,000 per statement in perpetuity seems wrong.
  11. Yep! What startup has the goal of making less than $10 million in annual revenue? That sentence was absolutely a deal breaker for the CEO and CTO of our last company.

    And since when has Docker Desktop "just worked"?

  12. This shouldn't be flagged....
  13. Are they running out of funds to drown out the protesters with their own marketing?
  14. The idea has been tried before and it failed because people don’t actually want this product at the scale the inventors thought. Amazon has never stopped doing this. Adding an element of indeterminism to the mix doesn’t make this a better product. Imagine what the LLM is going to hallucinate with your credit card attached.
  15. I mean, this really isn't a large codebase, this is a small-medium sized codebase as judged by prior jobs/projects. 9000 lines of code?

    When I give them the same task I tried to give them the day before, and the output gets noticeably worse than their last model version, is that better? When the day by day performance feels like it's degrading?

    They are definitely not as good as I would like them to be but that's to be expected of professionals who beg for money hyping them up.

  16. Grok hasn't gotten better. OpenAI hasn't gotten better. Claude Code with Opus and Sonnet I swear are getting actively worse. Maybe you only use them for toy projects, but attempting to get them to do real work in my real codebase is an exercise in frustration. Yes, I've done meaningful prompting work, and I've set up all the CLAUDE.md files, and then it proceeds to completely ignores everything I said, all of the context I gave, and just craps out something completely useless. It has accomplished a small amount of meaningful work, exactly enough that I think I'm neutral instead of in the negative in terms of work:time if I have just done it all myself.

    I get to tell myself that it's worth it because at least I'm "keeping up with the industry" but I honestly just don't get the hype train one bit. Maybe I'm too senior? Maybe the frameworks I use, despite being completely open source and available as training data for every model on the planet are too esoteric?

    And then the top post today on the front page is telling me that my problem is that I'm bothering to supervise and that I should be writing an agent framework so that it can spew out the crap in record time..... But I need to know what is absolute garbage and what needs to be reverted. I will admit that my usual pattern has been to try and prompt it into better test coverage/specific feature additions/etc on the nights and weekends, and then I focus my daytime working hours on reviewing what was produced. About half the time I review it and have to heavily clean it up to make it usable, but more often than not, I revert the whole thing and just start on it myself from scratch. I don't see how this counts as "better".

  17. Just ancedata, but they keep releasing new versions and it keeps not being better. What would you describe this as if not plateauing? Worsening?
  18. Why do people so desperately want to see AI succeed? The financial investment explains it for some.
  19. There aren’t enough GPUs for average gamers to buy anything vaguely recent and they would love to be able to. Making the best GPUs on the planet is still huge and the market is quite large. Scalping might finally die at this rate, but NVDA wasn’t making any of the scalping money anyway so who cares? Data centers and gamers still need every GPU NVDA can make.
  20. They get paid the more vibe coding occurs on their platform, so of course they have a two-pizza team dedicated to milking the latest trend.
  21. Congratulations for believing the marketing. He has about 2.46 trillion reasons to make this claim. In other news, water is wet and the sky is blue.
  22. > LLMs are eliminating the need to have a vast array of positions on payrolls. From copywriters to customer support, and even creative activities such as illustration and even authoring books, today's LLMs are already more than good enough to justify replacing people with the output of any commercial chatbot service.

    I'd love a source to these claims. Many companies are claiming that they are able to layoff folks because of AI, but in fact, AI is just a scapegoat to counteract the reckless overhiring due to free money in the market over the last 5-10 years and investors are demanding to see a real business plan and ROI. "We can eliminate this headcount due to the efficiency of our AI" is just a fancy way to make the stock price go up while cleaning up the useless folks.

    People have ideas. There are substantially more ideas than people who can implement ideas. As with most technology, the reasonable expectation is to assume that people are just going to want more done by the now tool powered humans, not less things.

  23. I've used a wide variety of the "best" models, and I've mostly settled on Opus 4 and Sonnet 4 with Claude Code, but they don't ever actually get better. Grok 3-4 and GPT4 were worse, but like, at a certain point you don't get brownie points for not tripping over how low the bar is set.
  24. Github Pages STILL don't have any sort of built-in analytics available. I shouldn't need GA or something else to track the basic website metrics when you absolutely know that MS and GH have been tracking these things the whole time. People have had issues up asking for this for literal years.
  25. Opus 4.1 does it too:

    How many b's in the word blueberry?

    There are 3 b's in the word "blueberry". The word is spelled: b-l-u-e-b-e-r-r-y The b's appear in positions 1, 5, and 6.

  26. My project is basically the same size as when I started using it.
  27. Just more ancedata, but I entirely agree. I can't say that I am happy with Sonnet's output at any point, really, but it still occasionally works, whereas Opus has been a dumpster fire every single time.
  28. Alright, well, Opus 4.1 seems exactly as useless as Opus 4 was, but it's probably eating my tokens faster. Wish they let you tell somehow.

    At least Sonnet 4 is still usable, but I'll be honest, it's been producing worse and worse slob all day.

    I've basically wasted the morning on Claude Code when I should've just been doing it all myself.

  29. Gas costs money. The car costs money. You can only do the same hike that's an hour away so many times, before you're traveling to go to new places, and hotels cost money at that point. Pickleball courts cost money. The pickleball equipment costs money. People do go to the library, and then they go home and don't interact with other people.
  30. I agree with your broader consensus, but, for that example you give, HEB absolutely lets you shop by recipe on their website.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal