I think it is even easier right now for companies to self host an inference server with basic rag support:
- get a Mac Mini or Mac Studio
- just run ollama serve,
- run ollama web-ui in docker
- add some coding assitant model from ollamahub with the web-ui
- upload your documents in the web-ui
No code needed, you have your self hosted LLM with basic RAG giving you answers with your documents in context.
For us the deepseek coder 33b model is fast enough on a Mac Studio with 64gb ram and can give pretty good suggestions based on our internal coding documentation.
- get a Mac Mini or Mac Studio - just run ollama serve, - run ollama web-ui in docker - add some coding assitant model from ollamahub with the web-ui - upload your documents in the web-ui
No code needed, you have your self hosted LLM with basic RAG giving you answers with your documents in context. For us the deepseek coder 33b model is fast enough on a Mac Studio with 64gb ram and can give pretty good suggestions based on our internal coding documentation.