Hello. I’m looking for advice on how best to setup a reliable backup system for one of Discourse forums I administrate. The backup file that gets generated is around 3GB. (This forum was populated by importing 20 years worth of a mailing list emails with images, thus the size.)
We use Communiteq (formerly DiscourseHosting) as a host which takes daily backup of the database and files for disaster recovery purposes and stores them off-site. They only guarantee to store one backup at a time and not the archiving of several old backups.
This is great as a first line of defense but we feel it would be wise to have an additional backup system completely under our control which also involves archiving a few backups at a time.
The question is, what’s the best way to do this given the size of the backup in terms of successfully dealing with the transfer of a large backup file and avoiding a long amount of read-only downtime?
I’ve been trying to get the built-in Amazon S3 backup to work with no luck so far. I’ve also read about a discourse dropbox backup plugin but haven’t tried it yet.
Before I dive too deep into Amazon or Dropbox as a solution, I’m wondering what other ideas/experiences people have regarding huge backups like this? Thanks!
We keep 2 backups and use S3. Our backups include uploads and they are at 8GB now. The RO time during the backups has stayed roughly the same (actually since out last CPU and RAM injection it’s gone down).
Keeping much more than 2-7 backups probably isn’t going to help you much. If you get a massive spam/troll attack and don’t notice it soon enough then going back more than a few days will mean you’ll end up throwing away good posts while trying to weed out the bad.
Our backups were about 3.4 GB when we finished the import from our old platform, so the size of the backups are growing at a rate which one might consider alarming. We may end up shifting uploads to S3 and excluding them from backups at some point, but for the moment we’re just keeping an eye on the situation.
I’d consider it extremely unlikely. Taking a diff of a relational database is far from trivial, and the gains are minimal. If you’re running a large enough site to make app-level “delta backups” worthwhile, you’re probably going to be switching off Discourse’ built-in backups anyway, and using pg_basebackup+WAL archiving to hit your RPO targets (which, as a bonus, gives you point-in-time restoration capability).