Preferences

keeeba
Joined 114 karma

  1. Doesn’t seem like this will be SOTA in things that really matter, hoping enough people jump to it that Opus has more lenient usage limits for a while
  2. As a fairly extensive user of both Python and R, I net out similarly.

    If I want to wrangle, explore, or visualise data I’ll always reach for R.

    If I want to build ML/DL models or work with LLM’s I will usually reach for Python.

    Often in the same document - nowadays this is very easy with Quarto.

  3. Oh boy, if the benchmarks are this good and Opus feels like it usually does then this is insane.

    I’ve always found Opus significantly better than the benchmarks suggested.

    LFG

  4. Please don’t actually use these 5,6,7-way Venn diagrams for anything practical, they’re virtually useless and communicate nothing.
  5. I agree it is a profound question. My thesis is fairly boring.

    For any given clustering task of interest, there is no single value of K.

    Clustering & unsupervised machine learning is as much about creating meaning and structure as it is about discovering or revealing it.

    Take the case of biological taxonomy, what K will best segment the animal kingdom?

    There is no true value of K. If your answer is for a child, maybe it’ 7 corresponding to what we’re taught in school - mammals, birds, reptiles, amphibians, fish, and invertebrates.

    If your answer is for a zoologist, obviously this won’t do.

    Every clustering task of interest is like this. And I say of interest because clustering things like digits in the classic MNIST dataset is better posed as a classification problem - the categories are defined analytically.

  6. “Skills are a simple concept with a correspondingly simple format.”

    From the Anthropic Engineering blog.

    I think Skills will be useful in helping regular AI users and non-technical people fall into better patterns.

    Many power users of AI were already doing the things it encourages.

  7. It came from nowhere to 1T tokens per week, seems… suspect.
  8. What use-cases do you see for the 270M’s embeddings, and should we be sticking to token embeddings or can we meaningfully pool for sentence/document embeddings?

    Do we need to fine-tune for the embeddings to be meaningful at the sentence/document level?

  9. Anthropic say Opus is better, benchmarks & evals say Opus is better, Opus has more parameters and parameters determine how much a NN can learn.

    Maybe Opus just is better

  10. How have you tested your recall in the long and short term? And what were the results?
  11. Just checking my notes here.

    This is the same Sam Altman who abandoned OpenAI’s founding mission in favour of profit?

    No it can’t be

  12. I want to believe that it wasn’t announced at that time, with that name, purely to detract from Google I/O.

    But it’s hard

  13. Nice story - I’ve seen you on the leaderboard a few times. Good luck through the rest of Foundations III
  14. Nice, also find small classifiers work best for things like this. Out of interest, how many, if any, of the 3million were labelled?

    Did you end up labelling any/more, or distilling from a generative model?

  15. Thanks for linking - I know this is pedantic but one might think OpenAI’s models could make their content free of basic errors quite easily?

    “Conretely, let's define a routine to be a list of instructions in natural langauge (which we'll repreesnt with a system prompt), along with the tools necessary to complete them.”

    I count 3 in one mini paragraph. Is GPT writing this and being asked to add errors, or is GPT not worth using for their own content?

This user hasn’t submitted anything.