Recommended backup process for very large forum?


(Leah Kramer) #1

Hello. I’m looking for advice on how best to setup a reliable backup system for one of Discourse forums I administrate. The backup file that gets generated is around 3GB. (This forum was populated by importing 20 years worth of a mailing list emails with images, thus the size.)

We use DiscourseHosting.com as a host which takes daily backup of the database and files for disaster recovery purposes and stores them off-site. They only guarantee to store one backup at a time and not the archiving of several old backups.

This is great as a first line of defense but we feel it would be wise to have an additional backup system completely under our control which also involves archiving a few backups at a time.

The question is, what’s the best way to do this given the size of the backup in terms of successfully dealing with the transfer of a large backup file and avoiding a long amount of read-only downtime?

I’ve been trying to get the built-in Amazon S3 backup to work with no luck so far. I’ve also read about a discourse dropbox backup plugin but haven’t tried it yet.

Before I dive too deep into Amazon or Dropbox as a solution, I’m wondering what other ideas/experiences people have regarding huge backups like this? Thanks!


(Andrew Waugh) #2

We keep 2 backups and use S3. Our backups include uploads and they are at 8GB now. The RO time during the backups has stayed roughly the same (actually since out last CPU and RAM injection it’s gone down).

Keeping much more than 2-7 backups probably isn’t going to help you much. If you get a massive spam/troll attack and don’t notice it soon enough then going back more than a few days will mean you’ll end up throwing away good posts while trying to weed out the bad.

Our backups were about 3.4 GB when we finished the import from our old platform, so the size of the backups are growing at a rate which one might consider alarming. We may end up shifting uploads to S3 and excluding them from backups at some point, but for the moment we’re just keeping an eye on the situation.


(Leah Kramer) #3

This is great to hear given the size of your backups. I think I will forge on with S3 and try to get it working. Thanks for weighing in!


(Sander Datema) #4

Would it be possible for Discourse to create delta backups in the near future? That would save a lot of space for larger forums.


(Matt Palmer) #5

I’d consider it extremely unlikely. Taking a diff of a relational database is far from trivial, and the gains are minimal. If you’re running a large enough site to make app-level “delta backups” worthwhile, you’re probably going to be switching off Discourse’ built-in backups anyway, and using pg_basebackup+WAL archiving to hit your RPO targets (which, as a bonus, gives you point-in-time restoration capability).


(Eli the Bearded) #6

Full DB and delta of uploads should both be doable and much smaller for a site like this. That’s not something available now.

Even just full DB and no uploads might be useful for a case like this. And that’s a currently supported backup option.


(Jeff Atwood) #7

Just skipping uploads is probably sufficient. The images are usually much much larger than the DB.