Preferences

marginalia_nu parent
Is there some accessible explainer for what these numbers that keep going up actually mean? What happens at 100% accuracy or win rate?

It means that the benchmark isn't useful anymore and we need to build a harder one.

edit: as far as what the numbers mean, they are arbitrary. They are only useful insofar as you can run two models (or two versions of the same model) on the same benchmark, and compare the numbers. But on an absolute scale the numbers don't mean anything.

typpilol
I thought the percentage was how many problems it successfully solved
Technically correct, but not helpful nor actionable.
marginalia_nu OP
It was actually very helpful as it answered my question about what the benchmark numbers are. It wasn't a request for advice, but I'm merely looking to understand the article, which doesn't really elaborate on what they are presenting; either assuming an audience that is very familiar with these benchmarks prior, or so dazzled by number going up they forget to ask what number is.
then we need new bench.

This item has no comments currently.