Check your logs, it’s likely that you have an error. I was seeing Access Denied like messages. Check your S3 backup process if you have one. For some reason there must have been an update to it at some point that required additional permissions on Amazon S3. A broader policy on AWS S3 fixed it for me.
Yes, in fact, I saw that message a few minutes ago when trying to search for any indication why sidekiq was paused, but it didn’t occur to me it could be related.
I did fix that option (I don’t want to delete files from S3) a couple of minutes before raising this topic, but I didn’t think much of it. When I imported the data to the new server, it appears some options were lost.
I was going to follow it to make sure the error went away, but now I’m hopeful that will fix the sidekiq pausing as well!
Just a guess - isn’t Sidekiq paused during a backup? (and perhaps understandably so because backup process takes up so much local computational resource)
So if this job falls over, it’s never automatically unpaused?
Oh yes, right. It gets paused during backup and never recovers. You’re exactly correct and I remember us running into this before.
I wonder if we should have a safety mechanism where sidekiq cannot be “paused for backup” for more than x hours, where x is maybe four? What do you think @gerhard?
An arbitrary admin pause of sidekiq should always be possible, but a “stuck forever for backup” pause doesn’t seem right to me.
I started following that lead, and it seems pretty promising. \
I went to check my backups, and just the first one after upgraded was in S3.
I couldn’t even start a new backup in discourse, I had to cancel several times (I’d cancel it, and refresh the page, and cancel it again and again).
So it appears the timeline is more or less like:
We upgrade
in less than 24h, a backup will happen and it’s uploaded to S3
sidekiq is paused
Even if I unpaused sidekiq, no new backups were created, which was unexpected.
The backup process fails to delete old backups from S3. Unfortunately it crashes inside the ensure block and prevents Sidekiq from starting again. I’m going to fix that.
I agree.
If a backup fails (even partially, to delete old files), I do not expect every other background process to be silently paused. I do expect admins to be somehow notified.
I just went to sidekiq because some support topics mentioned it, but I didn’t know my backups weren’t working.
Thanks! That’s weird, because I did not see that message and looked at the Dashboard dozens of times during the paused phase. Perhaps it was just me …
You remembered correctly. There’s no warning in the dashboard when Sidekiq is paused, since jobs are scheduled but not enqueued. The warning appears only when Sidekiq isn’t running at all.
I could add an additional warning, but I don’t think it’s needed. Sidekiq should now always be unpaused – no matter what.