Some of this is just goofy fun. Some of it is me exploring the tradeoffs between policy alignment, imagination, chain-of-thought reasoning, memory, agreeableness, fine-tuning, etc...
My biggest observation is that the "o1-preview" model imposes a SIGNIFICANT limit on freeform creativity, compared with "4o". The new model might be better at solving logic puzzles, writing code, etc, but it seems to struggle with metaphor.
Conversations with "4o" can be wild and fun!
Conversations with "o1-preview" are dry-as-toast.
I'm not sure if this is caused by the constraints of chain-of-thought or if it comes from the imposition of alignment policies, and I think that's an import area of research. Is it possible to invoke chain-of-thought reasoning without hampering creativity?
If we ever want to use agents like this in real scientific contexts, where the agent is capable of making true conceptual leaps, we will need to sacrifice some level of "alignment" in service of novelty and disagreeability.
It's a long thread, but if you're patient, there's a lot of interesting stuff there! And I thought it would be fun to share it with the wider community.
Enjoy!