Preferences

The idea that LLMs were trained on miscellaneous scraped low quality code may have been true a year ago, but I suspect it is no longer true today

All of the major model vendors are competing on how well their models can code. The key to getting better code out of the model is improving the quality of the code that it is trained on.

Filtering training data for high quality code is easier than filtering for high quality data if other types.

My strong hunch is that the quality of code being used to train current frontier models is way higher than it was a year ago.


This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal