Preferences

olliepro
Joined 29 karma
Data-Scientist > CS/AI PhD

  1. It feels like this should work, but the breadth of knowledge in these models is so vast. Everyone knows how to taste, but not everyone knows physics, biology, math, every language… poetry, etc. Enumerating the breadth of valuable human tasks is hard, so both approaches suffer from the scale of the models’ surface area.

    An interesting problem since the creators of OLMO have mentioned that throughout training, they use 1/3 or their compute just doing evaluations.

    Edit:

    One nice thing about the “critic” approach is that the restaurant (or model provider) doesn’t have access to the benchmark to quasi-directly optimize against.

  2. Do you have a better way to measure LLMs? Measurement implies quantitative evaluation... which is the same as benchmarks.
  3. A more sound approach would have been to do a monte carlo simulation where you have 100 portfolios of each model and look at average performance.
  4. Ohio bill in motion to deny AI legal personhood: https://www.legislature.ohio.gov/legislation/136/hb469
  5. Descript has some nice video/screen recording tools to help beginners make this kind of thing look moderately professional. I've used it for various walkthroughs in my previous place of employment.
  6. Probably don’t need the name of the field for ChatGPT to get it.
  7. The causal masking means future tokens don’t affect previous tokens embeddings as they evolve throughout the model, but all tokens a processed in parallel… so, yes and no. See this previous HN post (https://www.hackerneue.com/item?id=45644328) about how bidirectional encoders are similar to diffusion’s non-linear way of generating text. Vision transformers use bidirectional encoding b/c of the non-causal nature of image pixels.
  8. > any automation that requires a human staff member to intervene to complete every run is not automation

    Not strictly true. Barcode readers are used by humans and are definitely automation. The ironic part though is that the automation going on here is literally object classification, which humans are good at.

    The play may be to collect data and make their system better.

  9. sentence embedding models are great for this type of thing.

This user hasn’t submitted anything.