Comment by throwaway493943

throwaway493943 1 day ago parent

I'm going to be brave (but still use a throwaway) and ask the dumb question - what is wrong with putting things in queues to help with performance problems?

If some endpoint is too slow to return a response to the frontend within a reasonable time, enqueueing it via a worker makes sense to me.

That doesn't cover all performance issues but it handles a lot of them. You should also do things like optimize SQL queries, cache in redis or the db, perhaps run multiple threads within an endpoint, etc. but I don't see anything wrong with specifically having dozens of workers/queues. We have that in my work's Rails app.

Happy to hear how I can do things better if I'm missing something.

zdragnar 1 day ago

There's two primary areas that I've seen teams get bitten by this personally:

1) Designers don't understand that things are going to happen async, and the UI ends up wanting to make assumptions that everything is happening in real time. Even if it works with the current design, it's one small change away from being impossible to implement.

This is a general difficulty with working in eventually consistent systems, but if you're putting something in a queue because you're too lazy to optimize (rather than the natural complexity of the workload demanding it) you're going to be hurting yourself unnecessarily.

2) Errors get swallowed really easily. Instead of being properly reported to the team and surfaced to a user in a timely manner, the default setting of some configurations to just keep retrying the job later means if you're not monitoring closely you'll end up with tens of thousands of jobs retrying over and over at various intervals.

bigtunacan 1 day ago

We have 100s of queues processing millions of jobs in sidekiq queues at any given time.

These are data and compute heavy workloads that take anywhere from minutes to hours for a request to be completed, but the UI takes this into account.

Users submit a request and then continue onto whatever is the next thing they intend to do and then they can subscribe to various async notification channels.

It’s not the right choice for everything, but it’s the right choice for something’s.

Lio 1 day ago

These are good points, in answer to them:

1. Yes this is true but Rails now comes with nice support for async UI built to push updates to the browser via Hotwire and Turbo.

You’d need something like that anyway anytime you’re calling an external service you don’t control.

2. Again this is also a good point but even running every request synchronously you still need good error logging because you don’t want to share details of an error with your frontend.

With background jobs you definitely need to be on top of monitoring and retry logic. I also think you need to be very careful about idempotency amd retry logic.

I see that as the engineering trade offs for that pattern. There’s very little in the way of silver bullets in engineering; different solutions just come with different trade offs.

nkraft11 1 day ago

Error handling was a huge issue, along with other weird distributed system bugs. Backed up queues, job shedding, thundering herds, you name it. When you have jobs on queues kicking off new jobs on different queues, tracing issues is just miserable. Sure, it's not a problem of ruby per se, but engineers would basically just throw their hands up and say "ruby can't handle this" and sidekiq became the One True Way™.

pmontra 1 day ago

Maybe "ruby can't handle this" was a short form for "we can't run this in the Rails controller because the response would take too long" possibly calling 3rd party APIs, "and we would run out of threads."

Anything running in sidekiq is written in Ruby too.

RangerScience 1 day ago

I think your question is well-asked, and I lament any work environments that have led you to think asking a question like this would be A Problem(tm).

IMO - there’s a lot of things that queues are an excellent answer to. Potentially including performance.

But - queues (generally and among other things) solve the problem of “this will take some time AND the user doesn’t need an immediate response.”

If that’s not your problem, then using queues might not be the solution. If it’s something that’s taking too long and the user DOES need a response, then (as you say) optimizing is what you should try, not queues. Or some product redesign so the user either doesn’t need an immediate response. Or finding a way to split up the part producing an immediate response and the part that takes awhile.

For example: validating uploaded bulk data is in the right “shape”, and then enqueuing the full validation and insertion.

Also really really avoid jobs that enqueue jobs. Sometimes they’re necessary (spacing out some operation on chunks of a group; or a job that ONLY spawns other jobs) but mostly they’re a route to spaghetti.

rubyfan 1 day ago

You’re not missing anything and are correct in that there are plenty of reasons to use queues and defer work that can be handled asynchronously outside of a request/response. This is not specific to ruby or any language for that matter.

The parent indicated the cross region dynamic required extra routing logic and introduced debugging problems.

teyc 1 day ago

Queues have several problems - if the caller is http it may timeout and retry, leading to more jobs being queued - the caller may no longer care because it took so long and the work is wasted - if the caller is called from a queue it can cause cascades - you can fill a disk up and crash the system

crowcroft 1 day ago

To me the question is, so what's a better alternative? At least queues can be designed to handle timeouts, errors, and flakey APIs.

hobs 1 day ago

Pretty simple - people often think queues are magic, and an unbounded queue is just your previous problem but now way worse; what happens when the queue "gets full"?

But if you are just smoothing out some work its pretty normal, just make sure you are modeling things instead of putting it in the magic queue.

This item has no comments currently.