Copy backups to another server with rsync and cron

johncs · 2019 年 7 月 30 日午前 1:17

I have a backup server that coordinates backups across many servers. I want my backup server to grab Discourse backups from my forum’s server.

I gave some thought to how I’d allow the backup server to access the backup files on the forum’s server. The best way I could come up with is allowing remote access as the www-data user (who owns Discourse’s backups).

I didn’t want to allow the backup server to shell into the forum’s server as root (for standard sysadmin reasons). I also wanted to avoid doing anything that I thought could cause Discourse to choke during backups or restores. I also wanted to avoid hosting another service on forum server.

Anyways, here’s how I did it.

Allow remote access as the www-data user

Edit /etc/passwd and replace www-data’s shell with /bin/bash rather than /usr/sbin/nologin.
Edit /etc/passwd again and replace www-data’s home directory with /home/www-data rather than /var/www (optional, but appealing to me).
Add the backup server’s SSH key to /home/www-data/.ssh/authorized_keys.

rsync

Finally, on the backup server, I added an hourly cron command that ran the following script:

#!/usr/bin/env bash

set -xe

HOST="$1"
DIR="$2"
if [ -z "$HOST" ] || [ ! -d "$DIR" ]; then
	echo "$0 HOST DIR"
	exit 1
fi

# --ignore-existing will have rsync ignore any backups that have already been
# copied.
# --delay-updates ensures that only complete backups ever make it into $DIR. If
# this isn't specified, partial backups could end up in $DIR, and because
# --ignore-existing won't perform any kind of equality check, the problem will
# not be corrected or detected.
rsync --ignore-existing --delay-updates "$HOST:/var/discourse/shared/standalone/backups/default/*" "$DIR"

Hopefully this proves useful to someone out there.

Bathinda · 2024 年 8 月 17 日午前 8:59

すごい!!
初心者ユーザーの私でも間違いを犯さず（そして各ステップが何をしているのかを理解できるように）、以下の手順をもう少し詳しく説明していただけると、もっと感謝します。

これは何をするのですか？

ここで公開鍵のことを言っていますか？

johncs · 2024 年 8 月 17 日午後 10:23

これにより、www-data ユーザーが正常にログインできるようになります。これは「ログインシェル」を変更するもので、さらに詳しく調べるための良いキーワードです。

はい。秘密鍵は（基本的に）ホストマシン以外の場所にコピーしたり共有したりすべきではありません。

Bathinda · 2024 年 8 月 28 日午後 2:03

あなたは比較的新しいタイプの検索者なので、cronジョブを使用して、ローカルサーバーのバックアップをGoogle S3やiDrive S3などの異なるS3バケットに転送する簡単な方法も考えられますか？
（AWS S3バケットには、キーとシークレットを使用して直接設定できることはわかっています）。

pfaffman · 2024 年 8 月 28 日午後 2:09

S3バックアップを設定すると、自動的にS3にアップロードされますが、アップロードのすべてまたは none のいずれかになるため、S3にアップロードがない場合は、バックアップファイルにすべてのアップロードの複数のコピーがあります。

Bathinda · 2024 年 8 月 28 日午後 2:14

それはすでに知っています。そして、6年前に始めたときから今まで、私はこのセットアップ（すべてのメディアとバックアップをAWSバケットにアップロードする）を使用してきました。

しかし、私は上記について、私が直面している別の種類の問題について尋ねていました。
現在、ローカルのUbuntuサーバーにバックアップ（メディア「アップロード」を含む）を作成するように設定しています。しかし、（別のスレッドで議論されているように）、それらの（1GBの大きな）バックアップから復元することができません。何かが欠けているか、問題が発生しています。そのため、Googleバケットを使用して、AWSを完全に廃止することを考えていました。

pfaffman · 2024 年 8 月 28 日午後 2:48

AWS S3とGoogleのものは違いがわかりません。しかし、https://restic.net/ がお役に立つかもしれません。これはs3バケットにバックアップできるバックアッププログラムです。

復元の問題が何かわかりません。

Bathinda · 2024 年 8 月 29 日午前 11:33

John Sullivan:

#!/usr/bin/env bash

set -xe

HOST="$1"
DIR="$2"
if [ -z "$HOST" ] || [ ! -d "$DIR" ]; then
	echo "$0 HOST DIR"
	exit 1
fi

# --ignore-existing will have rsync ignore any backups that have already been
# copied.
# --delay-updates ensures that only complete backups ever make it into $DIR. If
# this isn't specified, partial backups could end up in $DIR, and because
# --ignore-existing won't perform any kind of equality check, the problem will
# not be corrected or detected.
rsync --ignore-existing --delay-updates "$HOST:/var/discourse/shared/standalone/backups/default/*" "$DIR"

このトピックにたどり着いた方のために、この最初の投稿についてもう少し詳しく説明したいと思います。

これはbashスクリプトで、ファイル名を .sh 拡張子を持つものにすれば、そのままファイルに貼り付けることができます。
スクリプトの最初の行は、どのシェルや環境を使用するかなど、スクリプトを実行するための環境を設定します: #!/usr/bin/env bash: これは、env コマンドで見つかった bash インタープリターを使用するようにシステムに指示します。
フラグ (set -xe):
- -x: デバッグを有効にします。これは、各コマンドとその引数が実行前にターミナルに出力されることを意味します。スクリプトのデバッグに役立ちます。
- -e: コマンドがゼロ以外のステータス（エラーを示す）を返した場合、スクリプトは直ちに終了します。これは、スクリプトが失敗した後に続行するのを防ぐのに役立ちます。
次の重要なステップ、変数 (HOST="$1" DIR="$2"):
- HOST="$1": スクリプトに渡された最初の引数 ($1) を変数 HOST に代入します。つまり、このスクリプトが実行されると、ユーザーからの入力を要求し、ユーザーが入力した最初の入力（$1）が「ホスト」値（おそらくデータがコピーされる場所）として渡されます/考慮されます。
- DIR="$2": スクリプトに渡された2番目の引数 ($2) を変数 DIR に代入します。つまり、ユーザーが最初の値（「$2」と呼ばれる）を入力した後に（ディレクトリパスを）入力したものは何でも、スクリプトは「ターゲットディレクトリ」として取得します。
  興味のある方は残りの2つのステップも説明できますが、次のステップは、ユーザーがプロンプトで正しいホストとターゲットディレクトリの値を提供することを確認するだけで十分です。それ以外の場合（最後のステップ）は、エラー出力として 1 を返します。
  私が繰り返したい主なことは、これはスクリプトであり、実行されると、ホスト（データがコピーされる場所）とターゲットディレクトリ（データが貼り付けられる場所）をユーザーに要求することです。そして、cronジョブファイルに このファイルへのパス を含めると、cronファイルで設定した回数だけこのスクリプトファイルが実行される可能性があります。

しかし、私が理解できなかったのは、実際のコピー＆ペースト（またはバックアップ）コマンドはどこにあるのかということです。
実際の同期はどのように行われるのでしょうか？

トピック		返信	表示
Backup Discourse in DO droplet to NAS device Support	12	749	2023 年 12 月 6 日
Extend built-in backup remote destination options Feature backups	5	766	2024 年 8 月 19 日
Powershell script for sysadmin to regularly download backups from server to computer Extras backups	12	2072	2025 年 5 月 8 日
Sshfs and backup Installation backups	5	511	2023 年 3 月 1 日
Backup discourse from the command line Self-Hosting how-to , backups	6	11669	2025 年 10 月 28 日

Copy backups to another server with rsync and cron

Allow remote access as the www-data user

rsync

関連トピック