Comment by gettingoverit

gettingoverit Sep 30, 2025 parent

On top of other criticism here, I'd like to add that the article optimistically assumes that actors are completely honest with their benchmarks when billions of dollars and national security are at stake.

I'm only an "expert" in computer science and software engineering, and can say that - neither of widely available LLMs can produce answers at the level of first year CS student; - students using LLMs can easily be distingished by being wrong in all the ways a human would otherwise never be.

So to me it's not really the question of whether CS-related benchmarks are false, it's a question of how exactly did this BS even fly.

Obviously in other disciplines LLMs show similar lack of performance, but I can't call myself an "expert" there, and someone might argue I tend to use wrong prompts.

Until we see a website where we can put an intermediate problem and get a working solution, "benchmarks show that our AI solves problems on gold medalist level" will still be an obvious BS.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous