(All the sudden having nightmares of getting billed for the conversations I have in the single player game I happen to be enjoying...)
If there is a future with this idea, its gotta be just shipping the LLM with game right?
That might be a nice application for this library of mine: https://github.com/Const-me/Cgml/
That’s an open source Mistral ML model implementation which runs on GPUs (all of them, not just nVidia), takes 4.5GB on disk, uses under 6GB of VRAM, and optimized for interactive single-user use case. Probably fast enough for that application.
You wouldn’t want in-game dialogues with the original model though. Game developers would need to finetune, retrain and/or do something else with these weights and/or my implementation.
Small, bounded conversations, with problematic lines trimmed over time, striking a balance between possibility and self-contradiction.
I could see it working really well in a Mass Effect-type game.
> If there is a future with this idea, its gotta be just shipping the LLM with game right?
Depends how high you can let your GPU requirements get :)
EDIT: Watching the videos, I am more and more confused by why this is even desirable. The complexity of dialogue in a game, it seems, needs to match the complexity of the more general possibilities and actions you can undertake in the game itself. Without that, it all just feels like you are in a little chatbot sandbox within the game, even if the dialogue is perfectly "in character." It all seems to feel less immersive with the LLMs.
For example, if you're a game company and you want to use LLMs so your players can converse with nonplayer characters in natural language, replacing a multiple-choice conversation tree - you'd want that to be low latency, and you'd want it to be cheap.