I have a backup server that coordinates backups across many servers. I want my backup server to grab Discourse backups from my forum’s server.
I gave some thought to how I’d allow the backup server to access the backup files on the forum’s server. The best way I could come up with is allowing remote access as the www-data user (who owns Discourse’s backups).
I didn’t want to allow the backup server to shell into the forum’s server as root (for standard sysadmin reasons). I also wanted to avoid doing anything that I thought could cause Discourse to choke during backups or restores. I also wanted to avoid hosting another service on forum server.
Anyways, here’s how I did it.
Allow remote access as the www-data user
Edit /etc/passwd and replace www-data’s shell with /bin/bash rather than /usr/sbin/nologin.
Edit /etc/passwd again and replace www-data’s home directory with /home/www-data rather than /var/www (optional, but appealing to me).
Add the backup server’s SSH key to /home/www-data/.ssh/authorized_keys.
rsync
Finally, on the backup server, I added an hourly cron command that ran the following script:
#!/usr/bin/env bash
set -xe
HOST="$1"
DIR="$2"
if [ -z "$HOST" ] || [ ! -d "$DIR" ]; then
echo "$0 HOST DIR"
exit 1
fi
# --ignore-existing will have rsync ignore any backups that have already been
# copied.
# --delay-updates ensures that only complete backups ever make it into $DIR. If
# this isn't specified, partial backups could end up in $DIR, and because
# --ignore-existing won't perform any kind of equality check, the problem will
# not be corrected or detected.
rsync --ignore-existing --delay-updates "$HOST:/var/discourse/shared/standalone/backups/default/*" "$DIR"
Hopefully this proves useful to someone out there.
Wow!!
Though I’d appreciate much more if you had explain the below given steps in a bit detail so that novice users like me couldn’t do anything wrong (and also get the idea what each step is doing).
Seeing that you are a bit new way searcher type person, Could there also be a simple way to transfer our local server backup to diff S3 buckets, like Google S3, iDrive S3 thru cron jobs?
(I know we can configure it for Aws S3 bucket direclty by using its key and secret).
If you configure S3 backups they get uploaded to S3 automatically, though they have either all of the uploads or none, so unless you have uploads on S3 you have multiple copies of all of the uploads in the backup files.
That I know already and until now, since beginning 6 years back, I was using this very setup (of uploading all media and backups to aws bucket).
But I was asking the above for a diff type problem being faced by me.
Now, I setup to create backups (that include media ‘Uploads’) on local ubuntu server. But (as being discussed in another thread), I’m not able to restore from those (1 gb big) backups. Something is missing/giving problem. So I was thinking of using google bucket and dispensing off with Aws altogether.
I don’t see the difference between AWS S3 and Google ones. But maybe https://restic.net/ willl help you out? It’s a backup program that can backup to s3 buckets.
For anyone coming to this thread like I did, I’d like to explain this first post in the topic a bit further.
This is a bash script, which may be pasted ‘as-it-is’ in a file named anything, but having extension .sh
First line of the script just sets the environment for the script to run, as to which shell or environ should be used: #!/usr/bin/env bash): This tells the system to use the bash interpreter found via the env command.
flags (set -xe):
-x: Enables debugging, which means that each command and its arguments will be printed to the terminal before being executed. This is helpful for debugging the script.
-e: Causes the script to exit immediately if any command returns a non-zero status (indicating an error). This is useful for preventing the script from continuing after a failure.
And in next imp step, Variables (HOST="$1" DIR="$2"):
HOST="$1": Assigns the first argument passed to the script ($1) to the variable HOST. That is when this script is run, it’d demand some input from the user, and whatever first input ($1) will be entered by the user, that will be passed/considered as the ‘Host’ value (from where data perhaps will be copied from).
DIR="$2": Assigns the second argument passed to the script ($2) to the variable DIR. I.e. whatever (directory path) will be entered by the user after inputting 1st value, (called ‘$2’) will be taken by the script as ‘Dir- target directory’.
If anyone interested I can explain the remaining 2 steps also, but suffice is to say that next step just checks that user supplies the correct host and target directory values when prompted. Otherwise (last step) would return 1 as an error output.
The main thing I’d reiterate that this is a script, which when run, would demand user for the host (from where data is to be copied) and target directory (where data is to be pasted). And you’d include the path to this file in your cron job file, which might run this script file as many times in the day as you would set in the cron file.
But what I’ve failed to understand is that where are actual copy pasting (or backup) commands? How would actual Sync occur?