Comment by riskable - Hacker Neue

riskable Dec 13, 2025 parent

Pub/sub via WebSockets seems like the simplest solution. You'll need to change your LLM serving architecture around a little bit to use a pub/sub system that a microservice can grab the output from (to send to the client) but it's not rocket science.

It's yet another system that needs some DRAM though. The good news is that you can auto-expire the queued up responses pretty fast :shrug:

No idea if it's worth it, though. Someone with access to the statistics surrounding dropped connections/repeated prompts at a big LLM service provider would need to do some math.

bragh Dec 13, 2025

Corporate security hates websockets though, SSE is much easier for end-users to get approved.

nightshift1 Dec 13, 2025

I think it would be even more wasteful to continue inference in background for nothing if the user decided to leave without pressing the stop button. Saving the partial answer at the exact moment the client disappeared would be better.

verdverm Dec 13, 2025

What if I want to have the agent go off and work on something for a while and I'll check back tomorrow?

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous