Preferences

DrNosferatu parent
Speaking of the Kwa et al. paper, is there a site that updates the results as new LLMs come out?

DrNosferatu OP
I.e.: tracking frontier LLM performance via the METR metric that defines task difficulty as the typical* time it takes a human to complete it.

This item has no comments currently.