Preferences

qafy
Joined 116 karma

  1. Great! I see that further down in the website, which I did not see before posting this comment. I think this could be valuable to demonstrate / communicate in the billing platform demo which is the first thing you see, and is what captured all of my attention (i never even scrolled down).

    Edit: I just re-ran the demo and it seemed way faster this time??? the first time it said GOAL: PRESS_ENTER... (agent proceeds to think about it for 5-8 seconds) which seemed hilarious to me.

  2. got it, I only looked at the website not the youtube video you posted above, my apologies. On the website, neither the billing platform demo nor the screenshots in the section below convey this value prop very well. Both sections show what appear to be trivial flows without explanation of some of the underlying complexities.

    I suppose if you are hitting your target demographic dead-on with your marketing efforts, the value prop should be completely obvious to them, but still could be more explicit in your differentiation.

  3. replying to myself here... I would be interested to see a more hybrid approach where an AI could step in to help retry / get past failures, or as a way of re-recording automation steps for a flow when something changes, but having AI in the loop for every action all the time feels wasteful at best.
  4. I can totally see the value of agent driven flows for automating flows that are highly dynamic, poorly specified, error prone, zero shot environments, etc, but that doesn't seem to be at all what you are demonstrating here. Maybe your demos could show something more "challenging" to automate?

    As someone who has spent a LOT of my time in my career working on browser automation and testing, speed and cost was always key. Even with existing programmatic tools like selenium, playwright, cypress, etc speed and headfull hosting costs were already big issues. This seems orders of magnitude slower and more expensive. Curious how you pitch this to potential customers.

  5. this is actually hilarious because now they can't call it a fluke or an act of god
  6. I want to believe this, and I think I still do believe this... What makes me waver in my position was an interview I gave to an engineer who had previously worked on pedestrian safety simulations at Waymo and had quit over ethical concerns. He wouldn't go into details obviously, but it did make me think... This was in ~2019 or 2020 though when they were still early in their development compared to now.
  7. What is the output format here? an iframe? an SDK i can integrate into my webapp? a whitelabeled URL? a non-whitelabeled URL? where is your documentation?
  8. What model(s) / providers are you using? Are you training on the data that the agent gets access to? Seems like there are some data governance and privacy red flags for anything involving remotely sensitive data...
  9. unfortunately, the techniques you are trying in order to get access to a dormant Github account are EXACTLY the same ones that github gets spammed with every day by bad actors attempting supply chain attacks. You don't have anything that proves your identity any more than any rando on the internet in Github's eyes at least. Everything you have presented here may be convincing enough to me, but probably not to GitHub's opsec policies.
  10. WSJ is paywalled and also actively blocks archive.org crawls / snapshots, so just FYI 99% of people here can't read this article.
  11. I have tried just about every third party CLI / TUI and I personally like Opencode the most. It has the best UX and the fact it natively integrates LSP for the agent to interact with is excellent. It is limited to models available via API, so for example it couldn't use codex at launch.
  12. This article does a good job of comparing functionality between codex and claude, but I see very little discussion here or elsewhere about the actual UX of the CLI tools. Codex is absolute garbage when it comes to the look, feel, and overall polish of the CLI experience (no syntax highlighting, no proper diff displays, no vim mode, poor visual differentiation of user vs agent messages, etc). Claude is a tiny bit better. However, both fall flat on their face compared to some open source agentic TUIs like Opencode, Crush, etc.
  13. Yeah I am curious what the actual resolution of these videos will be. The launch videos on this link will only play in like 360p for me.
  14. Optimization hinders evolution. - Alan Perlis

    Write that garbage code as long as it works. PMF doesn't give a shit about your code quality.

  15. sounds like poor prompt engineering. Devin and Claude can both do better work than many interns I have mentored in my career, and faster too. We likely have many many years until it will be even close to replacing an experienced developer but we are already at the point it IS replacing junior engineers.

    Whether you agree or not, the market has spoken. New grad hiring is WAY down. Fresh CS grads are having an hell of a time finding work compared to 2 years ago.

  16. press escape
  17. 2.5 is not the version number, it's the generation of the underlying model architecture. Think of it like the trim level on a Mazda 3 hatchback. Mazda already has the Mazda 3 Sport in their lineup, then later they release the Mazda 3 Turbo which is much faster. When they release this new version of the vehicle its not called the Mazda 4... that would be an entirely different vehicle based on a new platform and powertrain etc (if it existed). The new vehicle is just a new trim level / visual refresh of the existing Mazda 3.

    That's why Google names it like this, but I agree its dumb. Semver would be easier.

  18. 2.5 isn't the version number, its the model generation. it would only be updated when the underlying model architecture, training, etc are updated. this release is, as the name implies, the same model but likely with hardware optimizations, system prompt, and fine-tuning tweaks applied.
  19. > “We do not provide technology to facilitate mass surveillance of civilians," he continued. "We have applied this principle in every country around the world, and we have insisted on it repeatedly for more than two decades.”

    You sure about that? Azure Government is used almost exclusively by the US Department of Defense...

  20. try the `opencode` cli and see how much nicer it is to use. the visual differentiation between user and agent messages is essential, plus formatted output with colors preserved from console commands, LSP support, parallel tool calls, syntax highlighting for code snippets, proper display of all diffs, etc.
  21. I feel like the inherent bootstrap cost of hardware startups is usually reflected in fundraising amounts.
  22. This scratches the itch I've had since playing with Tonka trucks in the back yard when I was 5... Are you hiring?
  23. Ok, but why?
  24. Yeah this should have triggered some serious KYC flags at the carrier(s)...
  25. What sane cellular carrier would issue tens of thousands of sims to a party like this? There do appear to be a few different colors / designs of sims in the photos but still there has to be some shady back-end dealings with cellular carriers for this to even be plausible.
  26. Finally... the official Codex CLI is hot garbage.
  27. This could be solved at the OS level. Just crashing and closing the app would lead the user to simply re-open it and try again. However, if iOS detects this type of crash it could sternly alert the user that the application they are using is likely compromised. It could also transmit analytics for these specific types of crashes to Apple, who would have very realtime insights into newly compromised apps. I don't think the idea here is "crash silently and let the user reopen the app as many times as they want" I think its "crash very very loudly"

    > If you need more you need to get even luckier.

    This is a good point. Im not an expert but im guessing one is rarely enough, which would exponentially decrease your chances of success by brute force, e.g. 2 tags would be 1/256 etc

  28. XcodeGhost was an attack against app developers. It did not exploit the iphone or iOS in any way, it exploited humans who build iOS apps. Memory corruption and zero-day / zero-click exploits on devices is a very different thing.
  29. I have an only esim since the iPhone 11 was released.

    Pros:

    - Super easy to get esims while traveling. e.g. in Mexico i downloaded an app while still in the airport and paid $5 with apple pay and instantly activated a 1 month esim.

    - You can have multiple esimss. With physical sims you are limited to the physical number of sim slots on your phone, usually 1 or at most 2. With esim there is no such restriction.

    - More secure. esims can't be cloned (e.g. sim swapping attack) or simply removed from a stolen phone like physical sims.

    Cons:

    - If you get a new phone, you cant just pop your physical sim in. You need to go through your provider to transfer, which requires calling them and verifying your identity.

    I actually dont see this as a con really, I see this as a security benefit. Since I only get a new phone every 3-4 years, the 20 min on the phone it takes to transfer is not a significant burden.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal