Multiple backups generated daily


(Ilias) #1

Hello,

For the last couple of weeks, I often get “502 Bad Gateway” error when accessing my Discourse site and I strongly believe that this has to do with the lack of free space in my server:

# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            2.0G  4.0K  2.0G   1% /dev
tmpfs           396M  400K  395M   1% /run
/dev/vda1        79G   75G  884K 100% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none            2.0G  2.0M  2.0G   1% /run/shm
none            100M     0  100M   0% /run/user
none             79G   75G  884K 100% /var/lib/docker/aufs/mnt/7a7ed0d13f4233d33befdeb0b7e84dcb066e7a6c6f4faa64db80740dba514ae4
shm             512M  8.0K  512M   1% /var/lib/docker/containers/7a853459f1a1317a6410cc7a93245c0baec69da556de7079cdc2d2b7fb21af68/shm

It seems that sometimes multiple backups are generated daily:

On June 9th, there are 8 backup files!
What could be the reason for that?

Not a long time ago I deleted almost all of the backup files, but here we are again with the same issue over and over again.

Note that I have enabled backing up to S3 as well, if this has somehow to do with the problem.

Thanks,
Ilias


#2

What is your backup frequency setting?


(Ilias) #3

Backup frequency is set to “1”.



(Andrew Waugh) #4

What do your backup logs tell you?

We have an odd situation since about when we upgraded to 2.0.1.

  • Every now and then (about every 3rd or 4th day) the backup fails, the log indicates that there wasn’t enough space while it was gzipping the file at the very end of the process.
  • Sometimes I get a PM that the backup has failed, but when I look at /admin/backups there is a backup there for that day.

We had problems with backup in the past, perhaps a problem has reappeared in the latest changes:


(Jeff Atwood) #5

If sidekiq is restarting a lot you will definitely have a bad time. It is almost worth adding some failsafe protection here @tgxworld to verify that inappropriate / multiple backups do not happen when sidekiq is terminated unexpectedly.


(Andrew Waugh) #7

Sidekiq isn’t restarting that often since we added
UNICORN_SIDEKIQ_MAX_RSS: 1000

but the “failed, but sometimes not really failed” backups do happen after a sidekiq restart.

From memory we’ve had (since 2.0.1):

3x Backup actually failed, PM sent to mods
2x Backup actually completed, but “Backup Failed” PM sent to mods.

Perhaps when Sidekiq restarts, if there is a running backup, or a corpse thereof, then it should just cleanup the corpses (killing the vestigial job) and restart the backup. Alteratively, when a backup job starts it should check if there has been a Sidekiq restart since completion of the last backup and just let the job finish.


(Ilias) #8

To be honest, I can’t really tell if high utilization of server resources is causing backups to fail or the other way around.

I am facing slow response times for some months now, even though I upgraded my DigitalOcean droplet (now having 4GB of RAM and 80GB of disk space). Sometimes it takes about half a minute for a page to load. I am running ./launcher cleanup every now and then, but doesn’t seem to make any difference. I guess there are no old containers left in there, just something else eating up all the resources…

Regarding the backup logs, the last time I got a PM regarding a failed backup was a month ago and had to do with disk space running out:
[2018-05-11 04:09:41] pg_dump: [archiver] could not write to output file: No space left on device

However, it is a very usual thing for a backup to be generated more than once, as stated in my original post. Today, for example, there are also 2 backups for June 11th:

-rw-r--r-- 1 1000 www-data 2942006365 Jun 11 00:11 **rocking-gr-forum-2018-06-11-034823-v20180328180317.tar.gz**

-rw-r--r-- 1 1000 www-data 2942005845 Jun 11 00:51 **rocking-gr-forum-2018-06-11-042313-v20180328180317.tar.gz**

(Alan Tan) #9

Hmm are you seeing two backups daily or it is just for certain days? I’m trying to see if there is any pattern here.


(Alan Tan) #10

Looks like the command to clean up unwanted tar archives was broken


(Ilias) #11

This happens almost every other day.
I ended up removing the duplicate backups manually, until I/we figure out how to fix the issue. I am afraid that by the time I stop deleting the backups, there will be no free space left on my server and so Discourse will not be accessible.


(Alan Tan) #12

Does that mean once every 2 days?


(Ilias) #13

Yes, approximately.
Sometimes it may take 3 days for the double backups to happen.