Disk full due to too many old daily backup files with S3 partially configured


(Allen - Watchman Monitoring) #1

PSA - if the boot volume gets too full, not even cleanup will help:

$ ./launcher cleanup
WARNING: You must have at least 5GB of *free* disk space to run Discourse.

Insufficient disk space may result in problems running your site, and may
not even allow Discourse installation to complete successfully.

Please free up some space, or expand your disk, before continuing.

and

df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda         30G   28G     0 100% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
udev            992M   12K  992M   1% /dev
tmpfs           201M  364K  200M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none           1002M  1.4M 1000M   1% /run/shm
none            100M     0  100M   0% /run/user

Low on disk space, cleaning up old Docker containers
S3 backup error logs?
"Currently upgrading" even after a rebuild
(Allen - Watchman Monitoring) #2

what would happen if the site wasn’t running?

IOW - could this command be super destructive?


(Sam Saffron) #3

No it can not be super destructive, worse case you would have to re-download the base image, not the end of the world just a delay.


(Allen - Watchman Monitoring) #4

At this point, I get this, when run as root in /var/discourse

# docker rm `docker ps -a | grep Exited | awk '{print $1 }'`
docker: "rm" requires a minimum of 1 argument.
See 'docker rm --help'.

Usage: docker rm [OPTIONS] CONTAINER [CONTAINER...]

Remove one or more containers

(Allen - Watchman Monitoring) #5

I just solved this issue by deleting a number of stale backups from:

/var/discourse/shared/standalone/backups/default


(Sam Saffron) #6

cleanup will not touch that stuff btw… you can simply delete backups from the admin section if needed.


(Allen - Watchman Monitoring) #7

Agreed. In this case, a rebuild attempt failed, and I was no longer able to access anything at /admin.

IMO the solution is to set up offsite backups to s3. That was on the owners’s to-do list, but I hadn’t given it any urgency. Besides, I could always install some kind of early warning detection system on the computer, to give me a heads-up to this. Why I didn’t do that already is really just mind-boggling, to me… cobbler’s shoes I suppose.


(Sam Saffron) #8

How big is a backup? How many files were there?

Out of the box we are quite strict to only allow 7 files there or something. Perhaps we can add more safeguards to ensure that we do not do “that last backup” that will use up “last bit” of free space.

Calculating this kind of stuff is real hard though.


(Allen - Watchman Monitoring) #9

Backups started back on April 2nd, and are currently about 80 MB each.

I just deleted 129 and there are 8 more where those came from.


(Sam Saffron) #10

Wow that is weird, pretty sure that out of the box we no longer allow this kind of situation, @zogstrip can you confirm?


(Allen - Watchman Monitoring) #11

here’s the related settings… note that the owner hadn’t actually added the S3 key yet, perhaps that skipped a logic check?


(Sam Saffron) #12

Possibly, maybe the job was failing just at that point meaning that the max backup check never hits.


(Jeff Atwood) #13

Hey @mpalmer this is technically a bug, every launcher command assumes you are going to install or rebuild Discourse… even forcing download of the image and so forth. This should be fixed.

In this case the install / rebuild space check is preventing the disk cleanup from running.


(Allen - Watchman Monitoring) #14

A look at the logs on this site don’t show any errors… let me know if there’s something in particular to check for (or if you want to look directly.


(Matt Palmer) #15

Yep, it looks like I’ll need to move the resources checks out of prereqs and put them into their own little world that only runs on bootstrap. On it now, should be sorted shortly.


(Jeff Atwood) #16

Probably safe to check on rebuild as well.

@zogstrip we should look at how so many backups were stored without removing the oldest, out of box this should not be possible as we limit the number of retained backups, can you check?


(Matt Palmer) #17

Handily, rebuild calls bootstrap internally, so I’ll shoot two birds with one exposure.


(Matt Palmer) #18

The change to only warn about memory / disk space has been pushed to the docker_discourse repository; a git pull should pick up the changes.


Rename "s3 disable cleanup"?
(RĂ©gis Hanol) #19

That was indeed the issue. It was generating an exception when trying to upload to S3 which is the step before the remove_old_backups…

https://github.com/discourse/discourse/commit/a726f5efea29fe6edb4d620747edfde74cdb9de7


Max number of local backups ignored
(Jeff Atwood) #20