Profile: frtime3d - Hacker Neue

frtime3d

Joined Oct 16, 2025 2 karma

frtime3d Oct 16, 2025 parent

> If they specifically tried to cheat at this benchmark it would be obvious and they would be called out
I doubt it. Most would just go “Wow, it really looks like a pelican on a bicycle this time! It must be a good LLM!”
Most people trust benchmarks if they seem to be a reasonable test of something they assume may be relevant to them. While a pelican on a bicycle may not be something they would necessarily want, they want an LLM that could produce a pelican on a bicycle.

This user hasn’t submitted anything.

Preferences