Preferences

Thank god people are noticing this. I'm pretty sick of companies putting a higher number next to models and programmers taking that at face value.

This reminds me of audio production debates about niche hardware emulations, like which company emulated the 1176 compressor the best. The differences between them all are so minute and insignificant, eventually people just insist they can "feel" the difference. Basically, whoever is placeboing the hardest.

Such is the case with LLMs. A tool that is already hard to measure because it gives different output with the same repeated input, and now people try to do A/B tests with models that are basically the same. The field has definitely made strides in how small models can be, but I've noticed very little improvement since gpt-4.


This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal