Is daily backups enough?

I’m a bit of a “control freak” when it comes to not losing any data. Seeing just daily backups always makes me feel like something can happen to a server and suddenly a full day of data, that can be extremely important, is gone.

Without being too technical, because I’m not an expert, wouldn’t a system where things that are posted/added, are replicated to another server, be possible? I believe this is how a social media platform works when we post content?

If this is not possible with Discourse, wouldn’t hourly backups be a bit safer? I don’t see an option for that. It seems to only go as low as 1 (daily) or 0 (disabled).

How do you guys handle this?

A good VPS on a good platform is highly unlikely to have any problems and especially not between upgrades.

In nearly 8 years of running one of my forums I’ve not had a single loss of data.

the daily batch is designed as a trade off for most self hosters.

it’s a simple-ish system, regime and doesn’t demand too much space and processing.

I cannot imagine it being worth to do more often for most people.

I have never needed to use a back-up for online crashes, only use them for migration to new servers, if when necessary (because I’ve outgrown the smaller one!)

YMMV

However, if you do think you need a more frequent set up … be ready to customize your setup and be prepared to maintain that customisation (which would involve learning how to do it and/or hiring someone to help you)

3 Likes

Backups and replication are two different things.

Backups provide a snapshot of data at a given point. They provide a restore point.

Replication is distributing every action to a different system so you have it at more than one location. Deletes are also replicated.

If you really want to be fault tolerant, you need to have both. (And more…)

So replication only solves the problem of having current data in multiple places. Backups provide the method to restore a system to a given point in time.

Discourse uses 2 mechanisms for storage:

  1. PostgreSQL database for everything except attached files
  2. Attached files store on the local system or in S3

To backup and/or replicate the data stored in the PostgreSQL database you can check the PostgreSQL documentation for how to do that. Concerning backups, and replication.

Attached files is a bit more tricky. If you store in on S3 you can use S3 backups. For locally stored files you can use a variety of local system options.

Creating full backups is a heavy task depending on the amount of data. So cannot easily do that more often. Discourse’s standard backup procedure is to create full backups. If you really want to reduce the risk of losing data you need to look for other options.

One option might be provided by your hosting service: volume snapshots. This provides for a way to make an “instant” copy of data stored in a volume. This allows you to restore the volume do that time. Volume snapshots might also be available within the OS depending on the used file system. (btrfs supports this for example.)

Besides that, the PostgreSQL documentation also goes into making more continuous backups of the database allowing for excellent point-in-time-recovery of the database. (Don’t forget to ship the backups off-site.) This is much faster than full backups.

For more granular attachment backups you can use various backup tools which allow for managing full+differential backups. For example duplicity. Or you could rsync (without delete). Between the snapshots you would still be able to lose files. Using S3 without deletion would be safer as the files are already on an other system.

To conclude. Discourse’s standard backup mechanism is not well suited for a more frequent backup schedule. If you want to have more backups, use a combination of the standard PostgreSQL backup/replication features, S3, volume snapshots, etc.

On my site I don’t use Discourse’s backup system for regular backups. I still have daily backups, but I use a combination of pg_dumps and duplicity configs (coordinated via backupninja).

3 Likes

I do database backup every 4. hour. That is the timeframe where I can live with possible vanished posts. To compare: my ecommerce does backups every 5th minutes.

Once a day isn’t enough. Worth of max 24 hour of lost topics/posts is just too much.

1 Like

It’s about how much content you might lose - on a quiet forum, backup every few days wouldn’t be a problem, on a very busy forum, even an hour might feel like a lot of loss. But you need to consider the improbability of failure: if you lost an hour of posts once a year, say, would that be very disturbing? Every ten years? Each of us has our own view of risk.

2 Likes

An even bigger loss than the posts might be all the new accounts that were created across a 24 hour period.

Especially if Discourse is being used as the SSO provider for your other applications or other integrations.

I don’t think this “0 for daily” answer is correct:

1 Like

Zero disables backups. This setting just determines the number of days between back-ups.

@Jagster ‘s custom frequent DB back-ups sounds like the more appropriate solution you need if daily is not enough.

Yes, I was just highlighting how dangerously wrong AI gets its suggestions to people.

Imagine if someone saw that and implemented it because it’s what they were told to do? :confused:

4 Likes

Looks like it got sourced from Staging/Test server ignored the environment variable - #2 by RGJ. Will update the post to make it clearer.

5 Likes