Comment by smusamashah

smusamashah Apr 30, 2025 parent

That Putnam bench graph (middle one) is showing 49/658 solve rate.

> The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench.

Which is 0.07% (edit: 7%) for PutnamBench

darkmighty Apr 30, 2025

49/658 is 7%

smusamashah OP Apr 30, 2025

Sorry, forgot multiply by 100

booi Apr 30, 2025

I bet DeepSeek-Prover-V2 wouldn't have made that mistake

gallerdude Apr 30, 2025

classic human hallucination

HappyPanacea Apr 30, 2025

How likely is it that Putnam answers were in DeepSeek's training data?

EvgeniyZh Apr 30, 2025

The solutions weren't published anywhere. There is also no good automatic way to generate solutions as far as I know, even expensive ones (previous sota was 10 solutions and one before was 8 using pass@3200 for 7b model). Potentially the developers could've paid some people who are good in putnam-level math problems and lean to write solutions for LLMs. It is hard to estimate likelihood of that but it sounds like waste of money given relatively marginal problem/benchmark.

HappyPanacea Apr 30, 2025

AoPS seems to have a forum dedicated to Putnam (including 2024): https://artofproblemsolving.com/community/c3249_putnam and here is a pdf with solutions to Putnam 2023: https://kskedlaya.org/putnam-archive/2023s.pdf

EvgeniyZh Apr 30, 2025

These are still need to be formalized in Lean which can be harder than solving the problem sometimes

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous