Better auto-backup options

We could use a few more options for automatic backups. Digital Ocean offers pretty limited disk space, whereas S3 is infinite. Currently they are synced together, meaning S3 is limited to available local space. On my site, that means two daily backups complete, then the process stalls.

Here’s some ideas:

  • Option to backup database only (no files). This option is available already for manual backups. It saves a lot of space for file-heavy sites, and allows more backups to be stored locally.

  • Better scheduler. With two backup options, you will want the ability to do both at different times (ie daily database backups plus weekly full backup).

  • Option to manage S3 storage separate from Discourse. Again, with limited local disk space, I’d like the option to keep one daily backup on local disk but infinite backups on S3.

  • If there’s not enough local disk space left, delete the oldest backup before starting a new one. This would prevent a full disk from stalling your backups, assuming they are similar in size.

  • Notification if auto-backup failed to run. Right now it fails silently, and I have to remember to check on it.

8 Likes

It’s now available through the “backup_with_uploads” site setting :wink:

https://github.com/discourse/discourse/commit/de95573d2389099b754869f511bd18515c47e53e

2 Likes

Well, revisiting this. Glad to report this has worked great for awhile, by manipulating S3 settings and using Glacier:

This one is fixed too, apparently:

It will now notify you if backups fail. Yay! Unfortunately the way I found that out is when my disk filled up again…


So, I have a question about the files backup procedure. Currently I need at least double the backup’s size in available free space. For example my currently 8GB install needs 16 GB of free space for a backup to succeed. Otherwise gzip runs out of space, and the backup fails.

I assume that the backup script copies all the files to be backed up to a temp folder, then begins to zip the copies? Could it not work directly from the original files, and avoid this double space requirement?


Related, and this maybe should be a bug report. When the backup does fail, the script does not delete the partial failed backup file, which can be quite large. Because of this, multiple failed attempts will entirely fill up the disk and possibly crash the site.

3 Likes

We looked at this in exhaustive depth about 6 months ago. This is only true if you back up just the database. Once you have to backup the database and the uploads too, it is unavoidable that you need to merge the database backup, which is one giant streamed-to-compression file, with a bunch of .jpg and .gif and .png and .doc uploaded files, so at that point you need 2x disk space and that 2x space cost is indeed unavoidable.

So, if disk space is at a premium, select the “database only” backup style, and back up the uploads some other way. Compressing gifs and jpgs doesn’t work very well anyway, so just straight copying them out is a viable strategy, minus “lots of tiny files” processing overhead.

cc @pfaffman

6 Likes