Sidekiq daylight saving time issues?

michaeld · November 1, 2015, 5:39pm

Today we found a number of (independent) servers running into problems at the same time.
Symptoms: read only mode was enabled and the disk was full.

This turned out to be caused by the backup scheduler creating a LOT of backups. The disk filled up so quickly that our monitoring didn’t even catch it in time.

Excerpt from sidekiq.log below.

This didn’t happen on all servers, but on about 8% of them. The issues started independently just after 2:00 AM EST, i.e. just when daylight savings time ended. I can’t explain why this is causing it, but I do think it’s related.

Happened both on 1.4.2 and on 1.5.0beta3 servers.

[STARTED]
'system' has started the backup!
Marking backup as running...
Making sure '/var/www/discourse/tmp/backups/db2155/2015-11-01-020020' exists...
Making sure '/var/www/discourse/public/backups/db2155' exists...
Writing metadata to '/var/www/discourse/tmp/backups/db2155/2015-11-01-020020/meta.json'...
...
(lots of backup logs)
...
Unpausing sidekiq...
Marking backup as finished...
Finished!
[SUCCESS]
[STARTED]
'system' has started the backup!
Marking backup as running...
Making sure '/var/www/discourse/tmp/backups/db2155/2015-11-01-020143' exists...
Making sure '/var/www/discourse/public/backups/db2155' exists...
Writing metadata to '/var/www/discourse/tmp/backups/db2155/2015-11-01-020143/meta.json'...
[STARTED]
'system' has started the backup!
Marking backup as running...
Making sure '/var/www/discourse/tmp/backups/db1714/2015-11-01-020143' exists...Enabling readonly mode...

Making sure '/var/www/discourse/public/backups/db1714' exists...
Writing metadata to '/var/www/discourse/tmp/backups/db1714/2015-11-01-020143/meta.json'...
Pausing sidekiq...
Waiting for sidekiq to finish running jobs...
Dumping the public schema of the database...
[STARTED]
'system' has started the backup!
Marking backup as running...
Making sure '/var/www/discourse/tmp/backups/db1714/2015-11-01-020143' exists...
Making sure '/var/www/discourse/public/backups/db1714' exists...
Writing metadata to '/var/www/discourse/tmp/backups/db1714/2015-11-01-020143/meta.json'..

Mittineague · November 1, 2015, 6:04pm

Interesting problem.

If daylight savings was a “standard” it would be easy enough to not run back-ups during the affected time.

But it is anything but standardized

Maybe some way of rate limiting back-ups based on when the last successful one was run?

michaeld · November 1, 2015, 6:25pm

We have learned to never schedule cronjobs between 2:00 AM and 3:00 AM because of this.
But then again, I can’t figure out how this causes Sidekiq to behave like this.

codinghorror · November 1, 2015, 7:03pm

Hmm, anything we can do to protect against time change by an hour causing this @zogstrip?

zogstrip · November 2, 2015, 4:47pm

Move the automatic backup to 4AM?

michaeld · November 2, 2015, 5:49pm

That would be a good workaround, but doesn’t solve the root issue. Apparently something goes nuts - 2:00AM happening twice doesn’t explain why multiple backups started to run simultaneously. So there must be an additional bug in Sidekiq that got triggered somehow.

Mittineague · November 2, 2015, 6:02pm

Technically it doesn’t happen twice - IF - the timezone is used.
eg. 2 AM EST is not the same as 2 AM EDT

michaeld · November 3, 2015, 7:20am

That is why I didn’t use the timezone in my sentence.

eswald · November 9, 2015, 7:14pm

I’ve seen similar problems result from collecting the time from one source, but the time zone from another. This looks like the backup could be running at 2 AM EST, but getting saved as 2 AM EDT, then running again because it doesn’t think the 2 AM EST scheduled backup has run.

codinghorror · November 10, 2015, 3:10am

Can we add more protection here @zogstrip because this sounds quite painful.

zogstrip · November 12, 2015, 5:35pm

This should fix it

https://github.com/discourse/discourse/commit/3c2486e2baad698b6e7c2bebc6ed197111fed011

jomaxro · February 4, 2016, 8:29pm

Was this resolved? Daylight Saving Time starts again in just over a month…

michaeld · February 4, 2016, 9:07pm

It seems fixed, see the post by ZogStriP above yours. I think it only happened when the clock went back, so it shouldn’t occur in the spring anyway.

Topic		Replies	Views
Automatic backups are a hit or miss Support	12	1360	July 8, 2018
Sidekiq is being paused, how can I discover why? Support	18	3112	September 20, 2018
Sidekiq is unexpectedly paused Dev	16	1102	February 21, 2019
Backups are duplicating and not respecting number to keep on disk Installation	68	2379	February 15, 2019
Backup restore failed, sidekiq won't stop then runsaway Support	8	838	December 6, 2020

Sidekiq daylight saving time issues?

Related topics