Comment by agucova - Hacker Neue

agucova Nov 10, 2024 parent

This benchmark’s questions and answers will be kept fully private, and the benchmark will only be run by Epoch. Short of the companies fishing out the questions from API logs (which seems quite unlikely), this shouldn’t be a problem.

BeefWellington Nov 10, 2024

> answers will be kept fully private

> Short of the companies fishing out the questions from API logs (which seems quite unlikely)

They all pretty clearly state[1] versions of "We use your queries (removing personal data) to improve the models" so I'm not sure why that's unlikely.

https://help.openai.com/en/articles/5722486-how-your-data-is...

mewpmewp2 Nov 10, 2024

Ideally they would have batches of those exercises, where the only use the next batch when someone has solved a suspicious amount of those exercises. If it performs much worse on the next batch, that is a tell of leakage.

benchmarkist Nov 10, 2024

I looked at the sample questions and even if they get the questions there is no way they will figure out the answers without making significant breakthroughs in understanding mathematics and logic.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous