I have a backup server that coordinates backups across many servers. I want my backup server to grab Discourse backups from my forum’s server.
I gave some thought to how I’d allow the backup server to access the backup files on the forum’s server. The best way I could come up with is allowing remote access as the www-data user (who owns Discourse’s backups).
I didn’t want to allow the backup server to shell into the forum’s server as root (for standard sysadmin reasons). I also wanted to avoid doing anything that I thought could cause Discourse to choke during backups or restores. I also wanted to avoid hosting another service on forum server.
Anyways, here’s how I did it.
Allow remote access as the www-data user
Edit /etc/passwd and replace www-data’s shell with /bin/bash rather than /usr/sbin/nologin.
Edit /etc/passwd again and replace www-data’s home directory with /home/www-data rather than /var/www (optional, but appealing to me).
Add the backup server’s SSH key to /home/www-data/.ssh/authorized_keys.
rsync
Finally, on the backup server, I added an hourly cron command that ran the following script:
#!/usr/bin/env bash
set -xe
HOST="$1"
DIR="$2"
if [ -z "$HOST" ] || [ ! -d "$DIR" ]; then
echo "$0 HOST DIR"
exit 1
fi
# --ignore-existing will have rsync ignore any backups that have already been
# copied.
# --delay-updates ensures that only complete backups ever make it into $DIR. If
# this isn't specified, partial backups could end up in $DIR, and because
# --ignore-existing won't perform any kind of equality check, the problem will
# not be corrected or detected.
rsync --ignore-existing --delay-updates "$HOST:/var/discourse/shared/standalone/backups/default/*" "$DIR"
Hopefully this proves useful to someone out there.