Copy backups to another server with rsync and cron

johncs · July 30, 2019, 1:17am

I have a backup server that coordinates backups across many servers. I want my backup server to grab Discourse backups from my forum’s server.

I gave some thought to how I’d allow the backup server to access the backup files on the forum’s server. The best way I could come up with is allowing remote access as the www-data user (who owns Discourse’s backups).

I didn’t want to allow the backup server to shell into the forum’s server as root (for standard sysadmin reasons). I also wanted to avoid doing anything that I thought could cause Discourse to choke during backups or restores. I also wanted to avoid hosting another service on forum server.

Anyways, here’s how I did it.

Allow remote access as the www-data user

Edit /etc/passwd and replace www-data’s shell with /bin/bash rather than /usr/sbin/nologin.
Edit /etc/passwd again and replace www-data’s home directory with /home/www-data rather than /var/www (optional, but appealing to me).
Add the backup server’s SSH key to /home/www-data/.ssh/authorized_keys.

rsync

Finally, on the backup server, I added an hourly cron command that ran the following script:

#!/usr/bin/env bash

set -xe

HOST="$1"
DIR="$2"
if [ -z "$HOST" ] || [ ! -d "$DIR" ]; then
	echo "$0 HOST DIR"
	exit 1
fi

# --ignore-existing will have rsync ignore any backups that have already been
# copied.
# --delay-updates ensures that only complete backups ever make it into $DIR. If
# this isn't specified, partial backups could end up in $DIR, and because
# --ignore-existing won't perform any kind of equality check, the problem will
# not be corrected or detected.
rsync --ignore-existing --delay-updates "$HOST:/var/discourse/shared/standalone/backups/default/*" "$DIR"

Hopefully this proves useful to someone out there.

Bathinda · August 17, 2024, 8:59am

Wow!!
Though I’d appreciate much more if you had explain the below given steps in a bit detail so that novice users like me couldn’t do anything wrong (and also get the idea what each step is doing).

What the above does?

Do you mean Public key here?

johncs · August 17, 2024, 10:23pm

Allows the www-data user to successfully log in. This is changing the “login shell” which is a good keyword to search for to learn more.

Yes. Private keys should (basically) never be copied/shared anywhere outside of their host machine.

Bathinda · August 28, 2024, 2:03pm

Seeing that you are a bit new way searcher type person, Could there also be a simple way to transfer our local server backup to diff S3 buckets, like Google S3, iDrive S3 thru cron jobs?
(I know we can configure it for Aws S3 bucket direclty by using its key and secret).

pfaffman · August 28, 2024, 2:09pm

If you configure S3 backups they get uploaded to S3 automatically, though they have either all of the uploads or none, so unless you have uploads on S3 you have multiple copies of all of the uploads in the backup files.

Bathinda · August 28, 2024, 2:14pm

That I know already and until now, since beginning 6 years back, I was using this very setup (of uploading all media and backups to aws bucket).

But I was asking the above for a diff type problem being faced by me.
Now, I setup to create backups (that include media ‘Uploads’) on local ubuntu server. But (as being discussed in another thread), I’m not able to restore from those (1 gb big) backups. Something is missing/giving problem. So I was thinking of using google bucket and dispensing off with Aws altogether.

pfaffman · August 28, 2024, 2:48pm

I don’t see the difference between AWS S3 and Google ones. But maybe https://restic.net/ willl help you out? It’s a backup program that can backup to s3 buckets.

I’m not sure what your restore problem is.

Bathinda · August 29, 2024, 11:33am

John Sullivan:

#!/usr/bin/env bash

set -xe

HOST="$1"
DIR="$2"
if [ -z "$HOST" ] || [ ! -d "$DIR" ]; then
	echo "$0 HOST DIR"
	exit 1
fi

# --ignore-existing will have rsync ignore any backups that have already been
# copied.
# --delay-updates ensures that only complete backups ever make it into $DIR. If
# this isn't specified, partial backups could end up in $DIR, and because
# --ignore-existing won't perform any kind of equality check, the problem will
# not be corrected or detected.
rsync --ignore-existing --delay-updates "$HOST:/var/discourse/shared/standalone/backups/default/*" "$DIR"

For anyone coming to this thread like I did, I’d like to explain this first post in the topic a bit further.

This is a bash script, which may be pasted ‘as-it-is’ in a file named anything, but having extension .sh
First line of the script just sets the environment for the script to run, as to which shell or environ should be used: #!/usr/bin/env bash): This tells the system to use the bash interpreter found via the env command.
flags (set -xe):
- -x: Enables debugging, which means that each command and its arguments will be printed to the terminal before being executed. This is helpful for debugging the script.
- -e: Causes the script to exit immediately if any command returns a non-zero status (indicating an error). This is useful for preventing the script from continuing after a failure.
And in next imp step, Variables (HOST="$1" DIR="$2"):
- HOST="$1": Assigns the first argument passed to the script ($1) to the variable HOST. That is when this script is run, it’d demand some input from the user, and whatever first input ($1) will be entered by the user, that will be passed/considered as the ‘Host’ value (from where data perhaps will be copied from).
- DIR="$2": Assigns the second argument passed to the script ($2) to the variable DIR. I.e. whatever (directory path) will be entered by the user after inputting 1st value, (called ‘$2’) will be taken by the script as ‘Dir- target directory’.
  If anyone interested I can explain the remaining 2 steps also, but suffice is to say that next step just checks that user supplies the correct host and target directory values when prompted. Otherwise (last step) would return 1 as an error output.

The main thing I’d reiterate that this is a script, which when run, would demand user for the host (from where data is to be copied) and target directory (where data is to be pasted). And you’d include the path to this file in your cron job file, which might run this script file as many times in the day as you would set in the cron file.

But what I’ve failed to understand is that where are actual copy pasting (or backup) commands?
How would actual Sync occur?

Topic		Replies	Views
Backup Discourse in DO droplet to NAS device Support	12	730	December 6, 2023
Extend built-in backup remote destination options Feature backups	5	745	August 19, 2024
Powershell script for sysadmin to regularly download backups from server to computer Extras backups	12	2005	May 8, 2025
Sshfs and backup Installation backups	5	498	March 1, 2023
Backup discourse from the command line Self-Hosting backups , how-to	4	11415	February 18, 2025

Copy backups to another server with rsync and cron

Allow remote access as the www-data user

rsync

Related topics