Scheduled backup fails when Sidekiq is forcefully restarted


(Andrew Waugh) #1

The log indicates that the zip operation runs out of space, but…

it leaves 3 files (two are .gz) spaced roughly 30 min apart.
the .gz files do not appear in the GUI.
It seems to happen when sidekiq gets restarted for using too much memory.

We’ve oodles of RAM, and enough space on the drive if it weren’t for having 3 (failed) backups for that day.

78G space on the drive, backups are 16G each

Here are the corpses left over after the job fails

-rw-r--r-- 1 ubuntu www-data 17094434327 Mar 19 08:48 jag-lovers-forums-2018-03-19-083309-v20180309014014.tar.gz
-rw-r--r-- 1 ubuntu www-data 17099593947 Mar 19 09:19 jag-lovers-forums-2018-03-19-090524-v20180309014014.tar.gz
-rw-r--r-- 1 ubuntu www-data 17184337920 Mar 19 09:52 jag-lovers-forums-2018-03-19-093558-v20180309014014.tar

We started seeing the sidekiq restarts in December

Anyone else seeing this?


Multiple backups generated daily
(Jeff Atwood) #2

Yes, I have definitely seen this happen if sidekiq gets forcefully restarted (due to excess memory use) in the middle of a backup. cc @sam

Typically it was only an issue due to a global rebake, as the post version was incremented in a commit about 1 month ago – this means every single post in the system must be rebaked and can take months. The global rebake process causes Sidekiq to run out of memory much more frequently over that time period.


(Andrew Waugh) #3

Our sidekiq restarts between 3 and 12 times/day. This started happening back in Dec.

We’ve oodles of RAM. Is there a way we can up the limit for sidekiq (it’s failing when it goes much above 500M)?


CleanUpUploads job never completes leading to Sidekiq hanging and restarting
(Sam Saffron) #4

Yes absolutely, we do that on our heavy instances. Set:

env:
  UNICORN_SIDEKIQ_MAX_RSS: 1000

This will double it.


(Andrew Waugh) #5

To follow up on this:

It’s been a week since we increased the sidekiq headroom. Sidekiq restarts have gone from 3-12 restarts/day to zero, and the backup hasn’t failed once.


(Joshua Rosenfeld) #6

This topic was automatically closed after 4 days. New replies are no longer allowed.