Preferences

dontreact
Joined 1,597 karma

  1. Lung cancer screening should be used more broadly and improved over time in a data driven fashion!

    We can catch things early, it shouldn’t be limited to only for smokers.

  2. My take: multi turn evals are hard because to do it really correctly you have to simulate a user. This is not yet modeled well enough for multi turn to work as well as it could.
  3. It tests people chatting to ChatGPT! That's a pretty big and important use case.
  4. The flip side of this is that for some tasks (especially in ml/ai), doing it manually at least a few times gives you a sense of what is correct and a better sense of detail.

    For example, spending the time to label a few examples yourself instead of just blindly sending it out to labeling.

    (Not always the case, but another thing to keep in mind besides total time saved and value of learning)

  5. I think the methods here are highly questionable, and appear to be based on self report from a small amount of employees in Denmark 1 year ago.

    The overall rate of participation in the labor work force is falling. I expect this trend to continue as AI makes the economy more and more dynamic and sets a higher and higher bar for participation.

    Overall GDP is rising while labor participation rate is falling. This clearly points to more productivity with fewer people participating. At this point one of the main factors is clearly technological advancement, and within that I believe if you were to make a survey of CEOS and ask what technological change has allowed them to get more done with fewer people, the resounding consensus would definitely be AI

  6. Looks like I needed another disclaimer:

    I’m talking about a general trend I see in use of this term, not that it’s always a bad thing to say “I’m not technical so someone else should write the script”

    I agree with everything you said!

    Both things are happening in the world: people using this terminology to throw work at others needlessly, and people doing good division of labor.

  7. I think that “I’m not technical” is often an excuse for throwing work at other people and frankly can be a form of learned helplessness. Nowadays, there is less and less reason to ask other people to write one off scripts/queries, you can ask AI for help and learn how to do that.

    Since this is HN some disclaimers -no that’s not always what’s happening, when “not technical” is thrown around -no it’s not always appropriate to use AI instead of asking an expert

  8. Is there any evidence R1 is better than O1?

    It seems like if they in fact distilled then what we have found is that you can create a worse copy of the model for ~5m dollars in compute by training on its outputs.

  9. Cosine similarity is equal to the dot product of each vector normalized
  10. “In my humble opinion, these companies would not allocate a second of compute to lightweight models if they thought there was a straightforward way to achieve the next leap in reasoning capabilities.”

    The rumour/reasoning I’ve heard is that most advances are being made on synthetic data experiments happening after post-training. It’s a lot easier and faster to iterate on these with smaller models.

    Eventually a lot of these learnings/setups/synthetic data generation pipelines will be applied to larger models but it’s very unwieldy to experiment with the best approach using the largest model you could possibly train. You just get way fewer experiments per day done.

    The models bigger labs are playing with seem to be converging to about what is small enough for a researcher to run an experiment overnight.

  11. “Layoffs usually have nothing to do with performance”

    This has not at all been my experience. When forced to do layoffs in a large company, executives tend to look at performance reviews.

    What are other people’s experience with this?

  12. They still mature and yield, so the principal is not at risk. But yes, it is if you are a bank and people want to withdraw.
  13. The talking point has been that if we do this in cases where there isn’t shelter to offer, the people will come back. Let’s see how that plays out, will be informative.

    I think it’s also quite possible for some people it’s a needed wake up call

  14. I imagine it would never be optimal to set a price so low that utilization is always 100% externally
  15. I agree with you but it also makes me think: Google's TPUs are also fixed costs and these research experiments could have been run at times when production serving need isn't as high.
  16. Flag this phishing attempt
  17. I think numpy closely maps to how I think so it’s not as hard to read these dense lines as it would be to read expanded versions. I think my point of view is shared by a lot of leading researchers and this is why it is used more heavily.

    The kinds of type safety you want might be good for other use cases but for ML research they get in the way too much.

  18. I guess my question is:

    Does this work address a specious, disingenuous argument that is being put forth by NIMBYs to block solar installation, so does it address a real pain point?

    I agree we should be able to build enough solar but does this work address a real bottleneck or a fake problem presented as real by people with ulterior motives?

  19. Perhaps. But also throwing more flops at it has long been Ilya’s approach so it would be surprising. Notice also the reference to scale (“scale in peace”).
  20. How are they gonna pay for their compute costs to get the frontier? Seems hard to attract enough investment while almost explicitly promising no return.
  21. This is interesting. It seems like sidestepping a problem which needs to be attacked head-on: the amount of solar we need to build is more than what can be built in invisible areas isn't it? Perhaps this helps to get things going for now but can we actually build enough in areas with no visibility from residential?
  22. These cookies were very good, not God level. With a bit of investment and more modern techniques I think you could make quite a good recipe, perhaps doing better than any human. I think AI could make a recipe that wins in a very competitive bake-off, but it’s not possible or for anyone to win with all 100 judges.

    https://static.googleusercontent.com/media/research.google.c...

  23. I know a few people who work/worked there. I don’t agree with it but it is a genuinely held belief
  24. Another good reason more people should get screened for lung cancer. If you can find it early, you can maybe halt the progression with this drug!
  25. Elon’s belief that humans doing it with vision is proof it can be done with video
  26. Yep this is a nice feature and I use it…
  27. Gemini 1.5 Pro is on-par with GPT-4 and Opus in LMSys, and you can go try it for yourself in LMSys, and it's coming soon to Gemini Advanced (announced at I/O). Seems like GPT4O puts OpenAI in front again on LMSys
  28. Gemini Ultra has been available for people to try via Gemini Advanced (formerly Bard) for a few months
  29. It’s the difference between doing something transcendental a small number of times and doing something amazing billions of times.
  30. Yes thanks that is what I meant.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal