findjashua parent
LM Studio is the easiest way to do it
That's what I've been playing with. I can load 9 layers of a mixtral descendant into the 12gb vram for GPU and the rest into ~28gb ram for the CPU to work on. It chugs the system sometimes but the models are interestingly capable.