You can just about run a 32B (at Q4/Q5 quantization) on 24GB. Running anything higher (such as the increasingly common 70B models, or higher if you want to run something like Llama 4 or DeepSeek) means splitting the model between RAM and RAM. -- But yes, anything 24B or lower you can run comfortably, including enough capacity for the context.
If you have other models -- such as text-to-speech, speech recognition, etc. -- then those are going to take up VRAM for both the model and during processing/generation. That affects the size of LLM you can run.
Anything that overflows VRAM is going to slow down the response time drastically.
"Space heater" is determined by computational horsepower rather than available RAM.
How big a context window do you want? Last I checked that was very expensive in terms of RAM and having a large one was highly desirable.
This is "gaming PC" territory, not "space heater". I mean people already have PS5's and whatnot in their homes.
The hundreds of gigabytes thing exists because the big cloud LLM providers went down the increasing parameter count path. That way is a dead end and we've reached negative returns already.
Prompt engineering + finetunes is the future, but you need developer brains for that, not TFLOPs.