Comment by baalimago - Hacker Neue

So... It's a language model..? As in, not "large"? I'm a bit unsure of the magnitudes here, but surely "nano" and "large" cancel out

IanCal 1 day ago

No, vLLM is a thing for serving language models: https://github.com/vllm-project/vllm

barrenko 1 day ago

Is it more like llama.cpp then? I don't have access to the good hardware.

jasonjmcghee 14 hours ago

llama.cpp is optimized to serve one request at a time.

vllm is optimized to serve many requests at one time.

If you were to fine tune a model and wanted to serve it to many users, you would use vllm, not llama.cpp

jasonjmcghee 3 hours ago

Here's a super relevant comment from another post https://www.hackerneue.com/item?id=44366418

This item has no comments currently.