killingtime74 parent
Did everyone just miss the fact that the post says the intention is to run Llama 3 405b but it has less than 1/4 of the VRAM required to do so? Did you just change your goals mid build? It's commonly known how much ram is required for a certain parameter size.
The system has 512 GB of RAM so while it'll be slower at inference, he really has about 704 GB at his disposal to run the model assuming he distributes the weights across the VRAM and system RAM.