I’m curious to hear more about how you get useful performance out of your local setup. How would you characterize the difference in “intelligence” of local models on your hardware vs. something like chatgpt? I imagine speed is also a factor. Curious to hear about your experiences in as much detail as you’re willing to share!
Local models won't generally have as much context window, and the quantization process does make them "dumber" for lack of a better word.
If you try to get them to compose text, you'll end up seeing a lot less variety than you would with a chatgpt for instance. That said, ask them to analyze a csv file that you don't want to give to chatgpt, or ask them to write code and they're generally competent at it. the high end codex-gpt-5.2 type models are smarter, may find better solutions, may track down bugs more quickly -- but the local models are getting better all the time.
Yes, the models it can run do not perform like chatgpt or claude 4.5, but they're still very useful.