Preferences

zdragnar parent
There's two primary areas that I've seen teams get bitten by this personally:

1) Designers don't understand that things are going to happen async, and the UI ends up wanting to make assumptions that everything is happening in real time. Even if it works with the current design, it's one small change away from being impossible to implement.

This is a general difficulty with working in eventually consistent systems, but if you're putting something in a queue because you're too lazy to optimize (rather than the natural complexity of the workload demanding it) you're going to be hurting yourself unnecessarily.

2) Errors get swallowed really easily. Instead of being properly reported to the team and surfaced to a user in a timely manner, the default setting of some configurations to just keep retrying the job later means if you're not monitoring closely you'll end up with tens of thousands of jobs retrying over and over at various intervals.


bigtunacan
We have 100s of queues processing millions of jobs in sidekiq queues at any given time.

These are data and compute heavy workloads that take anywhere from minutes to hours for a request to be completed, but the UI takes this into account.

Users submit a request and then continue onto whatever is the next thing they intend to do and then they can subscribe to various async notification channels.

It’s not the right choice for everything, but it’s the right choice for something’s.

These are good points, in answer to them:

1. Yes this is true but Rails now comes with nice support for async UI built to push updates to the browser via Hotwire and Turbo.

You’d need something like that anyway anytime you’re calling an external service you don’t control.

2. Again this is also a good point but even running every request synchronously you still need good error logging because you don’t want to share details of an error with your frontend.

With background jobs you definitely need to be on top of monitoring and retry logic. I also think you need to be very careful about idempotency amd retry logic.

I see that as the engineering trade offs for that pattern. There’s very little in the way of silver bullets in engineering; different solutions just come with different trade offs.

nkraft11
Error handling was a huge issue, along with other weird distributed system bugs. Backed up queues, job shedding, thundering herds, you name it. When you have jobs on queues kicking off new jobs on different queues, tracing issues is just miserable. Sure, it's not a problem of ruby per se, but engineers would basically just throw their hands up and say "ruby can't handle this" and sidekiq became the One True Way™.
pmontra
Maybe "ruby can't handle this" was a short form for "we can't run this in the Rails controller because the response would take too long" possibly calling 3rd party APIs, "and we would run out of threads."

Anything running in sidekiq is written in Ruby too.

This item has no comments currently.