Preferences

That Putnam bench graph (middle one) is showing 49/658 solve rate.

> The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench.

Which is 0.07% (edit: 7%) for PutnamBench


49/658 is 7%
Sorry, forgot multiply by 100
I bet DeepSeek-Prover-V2 wouldn't have made that mistake
classic human hallucination
How likely is it that Putnam answers were in DeepSeek's training data?
The solutions weren't published anywhere. There is also no good automatic way to generate solutions as far as I know, even expensive ones (previous sota was 10 solutions and one before was 8 using pass@3200 for 7b model). Potentially the developers could've paid some people who are good in putnam-level math problems and lean to write solutions for LLMs. It is hard to estimate likelihood of that but it sounds like waste of money given relatively marginal problem/benchmark.
AoPS seems to have a forum dedicated to Putnam (including 2024): https://artofproblemsolving.com/community/c3249_putnam and here is a pdf with solutions to Putnam 2023: https://kskedlaya.org/putnam-archive/2023s.pdf
These are still need to be formalized in Lean which can be harder than solving the problem sometimes

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal