A point of context. On this leaderboard, Gemini 3 Pro is "without tools" and Gemini 3 Deep Think is "with tools". In the other benchmarks released by Google which compare these two models, where they have access to the same amount of tools, the gap between them is small.
The cost curve of achieving these scores is coming down rapidly. In Dec 2024 when OpenAI announced beating human performance on ARC-AGI-1, they spent more than $3k per task. You can get the same performance for pennies to dollars, approximately an 80x reduction in 11 months.
https://arcprize.org/leaderboard
https://arcprize.org/blog/oai-o3-pub-breakthrough