Comment by Townley - Hacker Neue

Townley Apr 29, 2025 parent

A competitive geoguesser clearly got there through memorizing copious internet searching. So comparing knowledge retained in the trained model to knowledge retained in the brain feels surprisingly fair.

Conversely, the model sharing, “I found the photo by crawling Instagram and used an email MCP to ask the user where they took it. It’s in Austria” is unimpressive

So independent from where it helps actually improve performance, the cheating/not cheating question makes for an interesting question of what we consider to be the cohesive essence of the model.

For example, RAG against a comprehensive local filesystem would also feel like cheating to me. Like a human geoguessing in a library filled with encyclopedias. But the fact that vanilla O3 is impressive suggests I somehow have an opaque (and totally poorly informed) opinion of the model boundary, where it’s a legitimate victory if the model was birthed with that knowledge baked in, but that’s it.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous