karolist parent
Same here, I run deepseek coder 33b on my 64GB M1 Max at about 7-8t/s and it blows all other models I've tried for coding. It feels like magic and cheating at the same time, getting these lenghty and in-depth answers with activity monitor showing 0 network IO.
I tried running Deepseek 33b using llama.cpp with 16k context and it kept injecting unrelated text. What is your setup so it works for you? Do you have some special CLI flags or prompt format?
I use the default prompt template which is defined in the tokenizer.config https://huggingface.co/deepseek-ai/deepseek-coder-33b-instru...
No special flags or anything, just the standard format. Do take care of the spaces and end of lines. sharing a gist of the function I use for formatting it: https://gist.github.com/theskcd/a3948d4062ed8d3e697121cabd65... (hope this helps!)
I actually use lmstudio with settings preset for deepseek that comes with it, except for mlock set to keep it entirely in memory, works really good