Preferences

Game_Ender
Joined 1,611 karma

  1. Why should he put effort into measuring a tool that the author has not? The point is there are so many of these tools an objective measure that the creators of these tools can compare against each other would be better.

    So a better question to ask is - Do you have any ideas for an objective way to a measure a performance of agentic coding tools? So we can truly determine what improves performance or not.

    I would hope that internal to OpenAI and Anthropic they use something similar to the harness/test cases they use for training their full models to determine if changes to claude code result in better performance.

  2. Can you link some? I can only find the hip exoskeletons.
  3. Speed and simplicity. Now I can fetch one binary on a system and in seconds fetch everything needed to run a Python tool or work on a code base.

    I can do all that without having to even worry about virtual ends, or Python versions too.

  4. You are looking for Codex CLI [0].

    0 - https://github.com/openai/codex

  5. I think the implicit take is that if your company hits AGI your equity package will do something like 10x-100x even if the company is already big. The only other way to do that is join a startup early enough to ride its growth wave.

    Another way to say it is that people think it’s much more likely for each decent LLM startup grow really strongly first several years then plateau vs. then for their current established player to hit hyper growth because of AGI.

  6. The link you have posted 404’s and I could seem to find a command like that in your repos. Can you be more specific?
  7. Have you tried a smart watch? The Duo 2FA app lets you add an arbitrary TFA code based authenticator with same QR code Google Authenticator supports and generate those from their Apple WatchOS [0] or Android WearOS apps. I have used it successfully for years, it's a huge reason I got an Apple Watch in fact. Now you'll have to configure your watch with a "work" focus mode that turns off all notifications and not install any fancy apps on the watch (do those still exist?), but it can free you from your phone.

    Along the same lines the Meta Wayfarer[2] smart glasses lets you take slice of life photos and videos without needing to whip out your phone. You lose a ton of quality but stay in the moment more. The AI features are getting better so eventually you'll be able to use it for basic information lookup.

    0 - https://guide.duo.com/apple-watch

    1 - https://guide.duo.com/duo-wear

    2 - https://www.meta.com/ai-glasses/wayfarer

  8. This is really great. Reading the bill raw feels like reviewing a diff with context set to 0.
  9. I am not sure there is too much value for this article for the typical hacker news conversation on LLM based tooling. Here we generally focus on if the tooling is effective, and can it be used make software quicker or more cheaply. The problem is the author is opposed using the cutting edge models on privacy and ethics grounds. So they say:

    > I have woefully little experience with these tools.

    > I do not want to be using the cloud versions of these models with their potentially hideous energy demands; I’d like to use a local model. But there is obviously not a nicely composed way to use local models like this.

    > The models and tools that people are raving about are the big, expensive, harmful ones. If I proved to myself yet again that a small model with bad tools was unpleasant to use, I wouldn’t really be addressing my opponents’ views.

    Then without having any real practical experience with the cutting edge tooling they predict:

    > As I have written about before, I believe the mania will end. There will then be a crash, and a “winter”. But, as I may not have stressed sufficiently, this crash will be the biggest of its kind — so big, that it is arguably not of a kind at all. The level of investment in these technologies is bananas and the possibility that the investors will recoup their investment seems close to zero.

    I think a more accurate take is this will be like self driving, huge investments, many more losers thank winners, and it will take longer than all the boosters think. But in the end we did get actual self driving cars, but this time it's with LLMs it is something that anyone can use by clicking a link vs. waiting for lots of cars to be built and deployed.

  10. Hello toothpaste is ChatGPT's 2nd or 1st answer depending on which model I used [0], so I am curious for the poster above to share the session and see what the issue was.

    There is known sensitivity (no pun intended ;) to wording of the prompt. I have also found if I am very quick and flippant it will totally miss my point and go off in the wrong direction entirely.

    0 - https://www.hackerneue.com/item?id=44164633

  11. What model and query did you use? I used the prompt "find me a toothpaste that is both SLS free and has fluoride" and both GPT-4o [0] and o4-mini-high [1] gave me correct first answers. The 4o answer used the newish "show products inline" feature which made it easier to jump to each product and check it out (I am putting aside my fear this feature will end up kill their web product with monetization).

    0 - https://chatgpt.com/share/683e3807-0bf8-800a-8bab-5089e4af51...

    1 - https://chatgpt.com/share/683e3558-6738-800a-a8fb-3adc20b69d...

  12. What is your preferred way to manage them?
  13. With Aider you pay API fees only. You can get simple tasks done for a few dollars. I suggest budgeting $20 or so dollars and giving it a go.
  14. What are those extra things you have to do more of? I only have experience with Aider so I am curious what I am missing here.
  15. Can you describe the why of the policy and if you are ok sharing the industry?

    I am also curious if you have other restrictions on information sharing, API usage, and what reference documentation to use.

  16. Getting a 503 with that link.
  17. To help those who got a bit confused (like me) this Groq the company making accelerators designed specifically for LLM's that they call LPUs (Language Process Units) [0]. So they want to sell you their custom machines that, while expensive, will be much more efficient at running LLMs for you. While there is also Grok [0] which is xAI's series of LLMs and competes with ChatGPT and other models like Claude and DeepSeek.

    EDIT - Seems that Groq has stopped selling their chips and now will only partner to fund large build outs of their cloud [2].

    0 - https://groq.com/the-groq-lpu-explained/

    1 - https://grok.com/

    2 - https://www.eetimes.com/groq-ceo-we-no-longer-sell-hardware

  18. There is a bigger safety margin for humans if you need to land in a relatively large area in the water somewhere with a larger range of acceptable velocities. I believe they considered a propulsive landing over land but decided against it to simplify the initial design.

    10 years later though they have added this ability as a backup [0]. Which again shows how if human lives are on the line you want to favor redundancy and simplicity over flash.

    0 - https://www.nasaspaceflight.com/2024/10/dragon-propulsive-la...

  19. I don’t know if it matters now but at some point certain targets were hardened to near misses of certain sizes but not direct strikes. So the better your accuracy the smaller the weapon (or fewer) you can use to take out those targets.

    So you could say the use would be increased certainty your enemies command and control and other bunkers would be destroyed increasing the odds of “winning” whatever happens afterwards.

  20. It looks like o1 also gets the right answer after thinking about it for 14 seconds: https://chatgpt.com/share/67962ead-a5f8-800a-bd91-9a145b993e...
  21. The tool has an excellent architecture section [0] that goes into how it works under the hood. It stands out to me that a complex tool has an overview to this depth that allows you to grasp conceptually how it works.

    0 - https://mergiraf.org/architecture.html

  22. Since they lack noise isolation over ear or tight fitting buds this can be a problem. I have the OpenSwim Pro and they are fine outside except for really high noise. But while on a treadmill in the Gym they could not overwhelm the background noise.
  23. The whole slew of layoff posts from Amazon, Google, and others.
  24. The author is positive because of all the safety layers that existed and staid intact, despite how flawed humans and companies are. The culture of looking at previous accidents like the UA232, where they lost ann engine and ALL controls with it, meant the A380 control system was engineered to take even more damage and it worked.

    I do agree though it did not spend enough effort focusing on the areas to improve:

    - A computer controlled engine that runs for 60 seconds while on fire, and lets a dangerous part spin too fast. It seems like something that should of been covered ahead of time.

    - An engine manufacturing process that is so complex it’s almost impossible to validate.

    - A fault management system that only shows you 1 or 2 at a time when you have 40.

  25. It would add up weight wise, and it’s one of the simpler parts. Jet engines are high performance precise machines with many quickly spinning parts. If you can’t bore a tube correctly how are you going to machine a high efficiency, balanced turbofan system?

    That said it seems like did have a poor process where a part could be out of spec and they had no good way to check it. As they mentioned about Swiss cheese, you want as many layers as possible, and checks like that are needed.

  26. Part of the way I explain this is the amount of overhead in a company or position. Say you have 20 hours of coordination, planning and meetings/week, and 20 hours of direct work. If you work 50 hours you know increase how much development you are doing by 50% by only working 25% more hours. Now it the organization can do the same by cutting overhead and meetings but that is usually not up to one high performance contributor.

    Like you said the impact of a top contributor doing 50% more work can be really large, entire new systems can be built, key features launched. It can get you promoted, but you definitely won’t get a 50% raise.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal