Preferences

The concurrent request handling seems great for our AI eval workloads, where we're waiting for LLM API calls and DB operations but curious how Vercel handles potential noisy neighbor issues when one request consumes excessive CPU/memory?

Disclosure: CEO of Scorecard- AI eval platform, current Vercel customer. Intrigued since most of our time serverless time is spent waiting for model responses, but cautious about 'magic' solutions.


We built Fluid with noisy neighbors(=requests to the same instance) in mind. So because we are a data-driven team, we

1. track metrics and have our own dashboards to ensure we proactively understand and act whenever something like that happens 2. also use these metrics in our routing to smartly know when to scale up. we have tested a lot of variations of all the metrics we gather and things are looking good

anyway, the more workload types we will host with this system, the more we know and the better/performant it will get. we're running this for a while now, and it shows great results.

there's no magic, just data coming from a complex system, fed into a fairly complex system!

hope that answers the question, and thanks for trusting us

So if undertood 1. correctly I could use this solution to potencially save money, but it could turn into a nigthmare very quickly if you guys aren't watching?
Yes quite helpful- thanks for explaining and will try it out!
i think the majority of Vercel customers are doing web site hosting & most of the web requests are IO bound so it makes sense to handle multiple requests per microvm.

can't say the same if customer is doing CPU bound workload.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal