Hourly backup, only if something has changed

Continuing the discussion from Hot off the presses, automated backup support!:

Hello,

I’m in a post-restore world :smiley: After testing the restore process from a fresh instance, Backup file not downloaded with .tar.gz extension
I just have to admit that I actually have more faith on discourse.org than my hosting company. :slight_smile:

So I have imagined a new feature that could be helpful for a lot of us :

Is it possible to automate an hourly backup, only if something has changed on the precious content (new upload, new message, option changed, etc, etc) ?

I don’t know if it is hard to do, but if this feature is possible, I could migrate my instance to the very cheap ($2,5/month) but insecure (no sla) offer by runabove/OVH : https://meta.discourse.org/t/runabove-ovh-experiment-vs-digital-ocean/24428 .

To me, hourly backup is the only thing missing to make the jump : I could easily take the risk to loose one lifetime hour in exchange of very cheap hosting. I just need a strong and fast restore process, and discourse/docker install is very fast. :smile:

That could be a big bump in discourse.org popularity.

What do you think of this proposal ? Perhaps hard to implement, I don’t know.

Many thanks !

Technically doable in a plugin but sounds VERY odd. What you would really like is some sort of delta, say every 10 minutes. That is much more technically tricky but ultimately way more useful.

3 Likes

Sounds like streaming replication is what we want here, to me.

Run the forum on cheap server #1, streaming replicate to cheap server #2, if server #1 goes down, then:

  • stop replication from #1 to #2 (it already isn’t working, but remove the setup)
  • set up discourse on server #3
  • point DISCOURSE_DB_HOST on #3 to #2 (and rebuild)
  • take a backup on #3 using #2 as database
  • remove DISCOURSE_DB_HOST from #3 container definition (and rebuild)
  • restore backup on #3 using #3 as database
  • switch over DNS
  • re-set up streaming replication from #3 to #2

Certainly not a setup for the faint of heart when it comes to ops.

2 Likes

For a typical forum every 10 minutes seems a bit extreme, it’s not financial transactions happening.

Surely there must be a more reasonable balance between risk of data loss <-> resource use

Hello

As a user, I need something as simple as what is actually implemented and works really great : if something is wrong (hoster fails), I go to aws.amazon.com, download the latest backup, launch a new VM instance, install discourse via docker in a snap and restore the backup.

Probably, but I need a simple restore process. :smile:

In fact, I analyzed the risks (very low) and the cost :

  • Backup process is probably CPU intensive for the VM, but if the system is as smart as “do backup every hour only if something changed”, that will be a good balance to me. “Every hour” could be adapted to each case, for example : “every 3 hours” for a larger instance

  • S3 or alternatives like Hubic are very cheap, and their use will be optimized by the “only if” option

  • Two separate archives (one for the uploaded files, one other for the rest) would definitely optimize the process, but restore should be more tricky.

To be simple and because my instance is very small (60 Mo), I think the only thing I need to start on a cheaper VPS, is to be able to lower the nightly archive frequency to every one or two hours instead of every 24 hours.

Do you know if it is possible to add this hack in the app.yml ?

Many thanks !

I don’t get why #3 exists in this example. If #1 replicates to #2 (PostgreSQL and files), wouldn’t it be easier to just copy the docker image to #2 and let #2’s PostgreSQL become the replication master when #1 fails?

@frederic: Configure PostgreSQL to store WAL files and provide disk space and limits to cover 5×N days of activity. Have a cronjob run every few minutes that fetches new WAL files and uploads via rsync. Have another cronjob run every N days that pulls a full database dump and destroys all prior WAL files and dumps (or moves them to offline backup).

You can store the necessary scripts for this in /var/discourse/shared/standalone (which gets mounted as /shared inside the container) and add lines to /etc/crontab in app.yml's custom commands section.

1 Like

Thanks @elberet for your proposition ! But I don’t get the restore process. :frowning: I would prefer a simple and strong solution, based on discourse framework.

You gave me an idea :slight_smile: : is it possible to trigger every one or two hours the backup sidekiq job from crontab ?

That would do the job. That would be quiet scaled at my small instance size. That would be simple, don’t you think ?

Many thanks !

Well, yeah, you can use cron to run a Ruby script that loads the Discourse app and triggers the Sidekiq job, but that doesn’t solve your original requirement of only creating backups after changes. :wink:

Yes, yes, you’re right, @elberet but I have no skills with Discourse, Ruby and Rails. For the moment, I only wrote my first tiny discourse plugin : https://github.com/fredericmalo/radiofrance_onebox

I don’t know how to question Discourse. For example, how to be heard with :

Please sweet discourse instance, can you tell me if something has changed since one or two hours ?

I do it step by step, with a little help from my community :smile:

1 Like

I figured we were just throwing out the #1 server, i.e. we had some kind of unrecoverable error.

Hourra !

It is simplier than we expected ! Discourse rocks ! :smiley:

Here is my Hourly Backup plugin : https://github.com/fredericmalo/discourse_hourlybackup_plugin

And it works ! :slight_smile: To be honest, I don’t understand any line of code, but it works :smile:

https://github.com/fredericmalo/discourse_hourlybackup_plugin/blob/master/plugin.rb#L8-L21

So now, can someone help me to turn this question into ruby ?

Please sweet discourse instance, can you tell me if something has changed since one hour ?

Is there any Discourse API that I can request to know if something in the content has changed during a lapse of time ? INSERT/UPDATE/DELETE of a post or a file should be enough.

Many thanks for your help ! :slight_smile:

1 Like

You could check whether a post/topic/user was created/updated in the last hour. Something like this

def has_something_changed_since?(date=1.hour.ago)
  [User, Post, Topic].any? { |klass| klass.exists?("created_at >= :date OR updated_at >= :date", date: date) }
end
1 Like

Many many thanks Régis, it simply works ! :smiley: Hourra !

https://github.com/fredericmalo/discourse_hourlybackup_plugin/blob/master/plugin.rb#L1-L28

What happened to this repository? I am looking to do something similar… Please reshare.