Preferences

On top of other criticism here, I'd like to add that the article optimistically assumes that actors are completely honest with their benchmarks when billions of dollars and national security are at stake.

I'm only an "expert" in computer science and software engineering, and can say that - neither of widely available LLMs can produce answers at the level of first year CS student; - students using LLMs can easily be distingished by being wrong in all the ways a human would otherwise never be.

So to me it's not really the question of whether CS-related benchmarks are false, it's a question of how exactly did this BS even fly.

Obviously in other disciplines LLMs show similar lack of performance, but I can't call myself an "expert" there, and someone might argue I tend to use wrong prompts.

Until we see a website where we can put an intermediate problem and get a working solution, "benchmarks show that our AI solves problems on gold medalist level" will still be an obvious BS.


This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal