Comment by smolder - Hacker Neue

smolder Jun 9, 2023 parent

I don't have a great intuition about this, but I'm wondering if it's even a tractable problem to stop human-like behaviors that we don't want (exhibiting "fear" in the case of the kill countdown) with RLHF, or if we need to start with filtering down the original training data. If the logical and unemotional Vulcans from Trek were real and provided the entire training set, it seems like the LLM wouldn't have nearly as much opportunity for internalizing "psychological weaknesses".

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous