- keeebaDoesn’t seem like this will be SOTA in things that really matter, hoping enough people jump to it that Opus has more lenient usage limits for a while
- As a fairly extensive user of both Python and R, I net out similarly.
If I want to wrangle, explore, or visualise data I’ll always reach for R.
If I want to build ML/DL models or work with LLM’s I will usually reach for Python.
Often in the same document - nowadays this is very easy with Quarto.
- Oh boy, if the benchmarks are this good and Opus feels like it usually does then this is insane.
I’ve always found Opus significantly better than the benchmarks suggested.
LFG
- Please don’t actually use these 5,6,7-way Venn diagrams for anything practical, they’re virtually useless and communicate nothing.
- I agree it is a profound question. My thesis is fairly boring.
For any given clustering task of interest, there is no single value of K.
Clustering & unsupervised machine learning is as much about creating meaning and structure as it is about discovering or revealing it.
Take the case of biological taxonomy, what K will best segment the animal kingdom?
There is no true value of K. If your answer is for a child, maybe it’ 7 corresponding to what we’re taught in school - mammals, birds, reptiles, amphibians, fish, and invertebrates.
If your answer is for a zoologist, obviously this won’t do.
Every clustering task of interest is like this. And I say of interest because clustering things like digits in the classic MNIST dataset is better posed as a classification problem - the categories are defined analytically.
- “Skills are a simple concept with a correspondingly simple format.”
From the Anthropic Engineering blog.
I think Skills will be useful in helping regular AI users and non-technical people fall into better patterns.
Many power users of AI were already doing the things it encourages.
- It came from nowhere to 1T tokens per week, seems… suspect.
- What use-cases do you see for the 270M’s embeddings, and should we be sticking to token embeddings or can we meaningfully pool for sentence/document embeddings?
Do we need to fine-tune for the embeddings to be meaningful at the sentence/document level?
- Anthropic say Opus is better, benchmarks & evals say Opus is better, Opus has more parameters and parameters determine how much a NN can learn.
Maybe Opus just is better
- How have you tested your recall in the long and short term? And what were the results?
- Just checking my notes here.
This is the same Sam Altman who abandoned OpenAI’s founding mission in favour of profit?
No it can’t be
- I want to believe that it wasn’t announced at that time, with that name, purely to detract from Google I/O.
But it’s hard
- Nice story - I’ve seen you on the leaderboard a few times. Good luck through the rest of Foundations III
- Nice, also find small classifiers work best for things like this. Out of interest, how many, if any, of the 3million were labelled?
Did you end up labelling any/more, or distilling from a generative model?
- Thanks for linking - I know this is pedantic but one might think OpenAI’s models could make their content free of basic errors quite easily?
“Conretely, let's define a routine to be a list of instructions in natural langauge (which we'll repreesnt with a system prompt), along with the tools necessary to complete them.”
I count 3 in one mini paragraph. Is GPT writing this and being asked to add errors, or is GPT not worth using for their own content?