I tried downloading your app, and it's a whopping 500 MB. What takes up the most disk space? The llama-server binary with the built-in web UI is like a couple MBs.
>the app is a bit heavy as is loading llm models using llama.cpp cli
So it adds an unnecessary overhead of reloading all the weights to VRAM on each message? On some larger models it can take up to a minute. Or you somehow stream input/output from an attached CLI process without restarting it?
What in the world are you trying to say here? llama.cpp can run completely locally and web access can be limited to localhost only. That's entirely private and offline (after downloading a model). I can't tell if you're spreading FUD about llama.cpp or are just generally misinformed about how it works. You certainly have some motivated reasoning trying to promote your app which makes your replies seem very disingenuous.
Llama.cpp's built-in web UI.