Sidekiq daylight saving time issues?

Today we found a number of (independent) servers running into problems at the same time.
Symptoms: read only mode was enabled and the disk was full.

This turned out to be caused by the backup scheduler creating a LOT of backups. The disk filled up so quickly that our monitoring didn’t even catch it in time.

Excerpt from sidekiq.log below.

This didn’t happen on all servers, but on about 8% of them. The issues started independently just after 2:00 AM EST, i.e. just when daylight savings time ended. I can’t explain why this is causing it, but I do think it’s related.

Happened both on 1.4.2 and on 1.5.0beta3 servers.

[STARTED]
'system' has started the backup!
Marking backup as running...
Making sure '/var/www/discourse/tmp/backups/db2155/2015-11-01-020020' exists...
Making sure '/var/www/discourse/public/backups/db2155' exists...
Writing metadata to '/var/www/discourse/tmp/backups/db2155/2015-11-01-020020/meta.json'...
...
(lots of backup logs)
...
Unpausing sidekiq...
Marking backup as finished...
Finished!
[SUCCESS]
[STARTED]
'system' has started the backup!
Marking backup as running...
Making sure '/var/www/discourse/tmp/backups/db2155/2015-11-01-020143' exists...
Making sure '/var/www/discourse/public/backups/db2155' exists...
Writing metadata to '/var/www/discourse/tmp/backups/db2155/2015-11-01-020143/meta.json'...
[STARTED]
'system' has started the backup!
Marking backup as running...
Making sure '/var/www/discourse/tmp/backups/db1714/2015-11-01-020143' exists...Enabling readonly mode...

Making sure '/var/www/discourse/public/backups/db1714' exists...
Writing metadata to '/var/www/discourse/tmp/backups/db1714/2015-11-01-020143/meta.json'...
Pausing sidekiq...
Waiting for sidekiq to finish running jobs...
Dumping the public schema of the database...
[STARTED]
'system' has started the backup!
Marking backup as running...
Making sure '/var/www/discourse/tmp/backups/db1714/2015-11-01-020143' exists...
Making sure '/var/www/discourse/public/backups/db1714' exists...
Writing metadata to '/var/www/discourse/tmp/backups/db1714/2015-11-01-020143/meta.json'..
2 Likes

Interesting problem.

If daylight savings was a “standard” it would be easy enough to not run back-ups during the affected time.

But it is anything but standardized

Maybe some way of rate limiting back-ups based on when the last successful one was run?

We have learned to never schedule cronjobs between 2:00 AM and 3:00 AM because of this.
But then again, I can’t figure out how this causes Sidekiq to behave like this.

Hmm, anything we can do to protect against time change by an hour causing this @zogstrip?

Move the automatic backup to 4AM?

1 Like

That would be a good workaround, but doesn’t solve the root issue. Apparently something goes nuts - 2:00AM happening twice doesn’t explain why multiple backups started to run simultaneously. So there must be an additional bug in Sidekiq that got triggered somehow.

Technically it doesn’t happen twice - IF - the timezone is used.
eg. 2 AM EST is not the same as 2 AM EDT

That is why I didn’t use the timezone in my sentence.

I’ve seen similar problems result from collecting the time from one source, but the time zone from another. This looks like the backup could be running at 2 AM EST, but getting saved as 2 AM EDT, then running again because it doesn’t think the 2 AM EST scheduled backup has run.

1 Like

Can we add more protection here @zogstrip because this sounds quite painful.

1 Like

This should fix it :clock3:

https://github.com/discourse/discourse/commit/3c2486e2baad698b6e7c2bebc6ed197111fed011

3 Likes

Was this resolved? Daylight Saving Time starts again in just over a month…

It seems fixed, see the post by ZogStriP above yours. I think it only happened when the clock went back, so it shouldn’t occur in the spring anyway.

3 Likes