Preferences

PoignardAzur parent
This seems like strong evidence that what the model learns is "Avoid answering questions in a way that would make OpenAI look bad when the screenshot shows up on social networks".

I wonder how much this is a result of various heuristics combining vs the network explicitly learning to model and maximize the above objective.


This item has no comments currently.