Comment by sillyfluke

sillyfluke Nov 25, 2025 parent

I don't really find this a helpful line to traverse. By this line of inquiry most of the things in software are psychological.

Whether something is a bug or feature.

Whether the right thing was built.

Whether the thing is behaving correctly in general.

Whether it's better at the very moment that the thing occasionally works for a whole range of stuff or that it works perfectly for a small subset.

Whether fast results are more important than absolutely correct results for a given context.

Yes, all things above are also related with each other.

The most we have for LLMs is tallying up each user's experience using an LLM for a period of time for a wide rane of "compelling" use cases (the pairing of their prompts and results are empirical though right?).

This should be no surprise, as humans often can't agree on an end-all-be-all intelligence test for humans either.

ACCount37 Nov 25, 2025

No. I'm saying that if you take the same exact LLM on the same exact set of hardware and serve it to the same exact humans, a sizeable amount of them will still complain about "model nerfs".

Why? Because humans suck.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous