which still happens for us on stable, shortly after updating to version 1.6.6.
We have no automatic backup since November 5th, 2016 and are unable to initiate a manual backup:
[2016-11-08 20:09:23] 'Guiwy' has started the backup!
[2016-11-08 20:09:23] Marking backup as running...
[2016-11-08 20:09:23] Making sure '/var/www/discourse/tmp/backups/default/2016-11-08-200923' exists...
[2016-11-08 20:09:23] Making sure '/var/www/discourse/public/backups/default' exists...
[2016-11-08 20:09:23] Backup process was cancelled!
[2016-11-08 20:09:23] Notifying 'Guiwy' of the end of the backup...
Any ideas?
Additionally and it may be related, we experienced a constant increase in CPU load since this update:
I just got a high CPU alert from my server and seem to be seeing the same issue after updating to 1.6.6 a couple days ago. No nightly backups since then and the same Sidekiq heartbeat test failed, restarting
in the error log.
Iâm rebuilding app nowâŠ
After rebuild, still high CPU. Went into Sidekiq and as has happened a couple times in the past out of nowhere tens of thousands of user emails queued up and the app is using all its resources to generate and send them. No idea why. I deleted all the queued emails and CPU usage seems to have settled down now.
I tried to kick off a manual backup and nothing seems to be happening - Iâm just getting a spinning indicator and the phrase No logs yet... nothing happening in sidekiq.
Canât remember exactly but recently enough that I was able to type sudo ./laun and up-arrow a few times to get to sudo ./launcher rebuild app (iow, it was still in my shell history).
Little more data. When I go into the backup page, it thinks that a backup is in progress - there is an active cancel button. I push that button and it looks like it is canceling (asks me to confirm, cancel button goes away, backup button appears in its place), but then if I reload the backup page, the cancel button is again active. There is no backup actually happening, but some flag somewhere thinks there is and doesnât seem clearable.
I could do that. Iâd want to know what itâs supposed to do first. Does discourse use redis to cache objects and is the theory that there is something bad in that cache?
It will erase everything stored in redis.
You will potentially lose some pending email notifications but thatâs about it.
We use redis to store the âread-onlyâ and âbacking upâ state. Due to the bug you were experiencing, it looks like itâs still thinking youâre doing a backup, when youâre not.
Yesterday we rebuilt and it no more seemed that a backup process was running. However, this morning the Cancel button was again active with no new nightly backup.
The same error also appears regularly in the logs.
We are starting to live on the edge with no backups for this many days and no clear workaround for nowâŠ
But it is clearly not and I would even say that it is doing way too much work: several hundreds jobs per minutes. There is also a backlog of 500+ planned jobs which seems to be slowly growing.
Post the rebuild and the redis purge I kicked off a manual backup which successfully created a new backup, but then got stuck on ânotifying sysadmin of finishing the backupâ (or whatever it says exactly).
After which, again, cancel button -> looks like it works -> refresh page cancel button active againâŠ
Went into Sidekiq, looked at scheduler tab which said âSidekiq Pausedâ. I unpaused it.
Still thinks it is doing a backup. Iâll now purge the redis cache again.