🇨🇳 How to back up Discourse to S3 | How to back up Discourse to S3 in Chinese

Discourse and S3 are great friends. If you are familiar with S3, it will be very helpful for you.

Many people have limited virtual hosting space and resources.

Using S3 for backups can make better use of space.

You can follow the steps below to configure:

Set Backup Frequency

Go to admin > backup, and set backup_frequency to 1. This parameter indicates the backup frequency, with a default of 7.
1 means backing up once a day.
7 means backing up once every 7 days.

For general website access, if you are using S3 for backup storage, it is best to back up once a day.

Set Backup Bucket and Path.

This Bucket can be private and not publicly accessible. It is important to note that if you are also using S3 for image and attachment storage, that Bucket needs to be set to public.

For convenience, you can create another bucket here. Try not to confuse it with the storage for attachments and images.

We recommend setting an additional directory path here, as Discourse will create multiple necessary folders within this folder.

This will make your storage clearer and more defined.

Set s3_access_key_id and s3_secret_access_key

Next, you need to set the s3_access_key_id, s3_secret_access_key, and s3_region for your backup data. These 3 parameters are very important, and the region must be selected correctly. If your backups are not uploading, it is most likely a permission issue.

For specific setup methods, please refer to the content in the article: Setting up file and image uploads to S3 - sysadmin - Discourse Meta .

It is important to note that you need to grant sufficient permissions to your key ID, otherwise you will not be able to upload.

Set Backup to S3 Storage

Set the backup method to S3 storage.

You need to change the storage from Local to S3 in this parameter selection.

Test Backup

Once everything is set up, you can test the backup.

Click the backup button to test the backup. In the backup menu, simply click Backup.


In the pop-up window, you will be asked whether to include uploaded images and attachments.

Generally, you would select Yes here. The interface will then redirect to the log interface, and the backup information will be displayed through the logs. You can determine if the backup is complete by observing whether the log displays “Finished”.

More importantly, you can log in to your S3 account to confirm that the latest backup is available.


You need to pay attention to the time, size, and filename for confirmation.


By setting up S3 backups, we can expand Discourse’s storage space, gaining almost unlimited backup and storage capacity. For website operations, automatic backup and upload are very practical features.

You will also have multiple backup storage points, making it easier to restore to different backup points when recovering your website.

Since you have separated the backup files from Docker, this is very helpful for your daily backups and can significantly reduce storage space usage.

We also recommend storing images and attachments on S3, which offers significant advantages for migration and backup recovery.

Please refer to the original article iSharkFly - 飞鲨 for more information.

2 Likes

I would like to ask, if backups and attachments are mounted to different S3 buckets, will the content in the attachment’s S3 also be backed up? If I do not choose to include uploaded images and attachments, will the content in the attachment’s S3 still be displayed normally in the forum when restoring the backup?

I haven’t carefully reviewed the backup content for Discourse.

After looking at our backups, I realized:

If your attachments are stored using AWS cloud storage, even if you select Include attachments in backup during backup, the attachments uploaded to AWS will not be included in your backup file.

The attachments within the backup are only those stored on your local computer, not the ones on AWS.

This can be seen from the size of our website backup. If attachments were included, the backup size would not be only 80+ MB.

This indicates that the backup only contains the database and local attachments.

Opening this downloaded file, you’ll see only 2 folders: one is dump, which is the PGSQL database dump file.

The other is the uploads folder, which contains only your locally uploaded attachments, not those stored on AWS. For us, this folder is very small with few files.

This is because we uploaded all attachments to AWS shortly after the community started running.

The image above shows the content of the PGSQL dump file. You can see the PGSQL version running in the current Discourse database container from the dump file.

If you want to view the database locally, this dump file can be directly imported into your local container.

AWS Recovery Issues

If you use AWS attachments but do not use AWS CDN, the content in the main text will be the absolute path address on your AWS.

This is how it appears in the theme MD file:

However, after the content is published, Discourse actually replaces the HTML code with your CDN absolute address.

Therefore, based on the answer above, if you do not select to back up attachments during backup, the attachment content will not be affected during restoration.

Exception

Attachments are actually affected, mainly due to domain name switching.

We had a domain name switch in the early stages, and although the attachment content was still there, the main text could not be linked. Even reconstructing the HTML could not link them.

At this point, it’s a bit troublesome and might require direct modification in the database.

As long as you don’t arbitrarily change your domain name, this is usually not a problem.

For more detailed discussion, please visit: Discourse 备份和恢复中有关附件的问题 - Discourse - iSharkFly

I would also like to ask another question. I did not use Amazon Cloud S3, but Cloudflare’s R2, and I have successfully backed up to R2. I can see the files in Cloudflare, but the backup files are not displayed in the Discourse backend. May I ask what the problem is?


Back up manually one more time and check the backup logs.

This is most likely due to an error in Discourse’s API call to check the status after storing the backup in R2 storage.

See if the log content is complete.

This is the screenshot that was just generated, and it seems to show everything is normal. Also, I created the highest-privilege API in R2.

I ran my backup process, and it seems our logs are the same.

[2024-07-26 11:56:00] pg_dump: executing SEQUENCE SET category_custom_fields_id_seq
[2024-07-26 11:56:00] Finalizing backup...
[2024-07-26 11:56:00] Creating archive: isharkfly-2024-07-26-115540-v20240723030506.tar.gz
[2024-07-26 11:56:00] Making sure archive does not already exist...
[2024-07-26 11:56:00] Creating empty archive...
[2024-07-26 11:56:00] Archiving data dump...
[2024-07-26 11:56:00] Archiving uploads...
[2024-07-26 11:56:00] Skipping uploads stored on S3.
[2024-07-26 11:56:00] Removing tmp '/var/www/discourse/tmp/backups/default/2024-07-26-115540' directory...
[2024-07-26 11:56:00] Gzipping archive, this may take a while...
[2024-07-26 11:56:05] Uploading archive...
[2024-07-26 11:56:09] Executing the after_create_hook for the backup...
[2024-07-26 11:56:09] Deleting old backups...
[2024-07-26 11:56:10] Cleaning stuff up...
[2024-07-26 11:56:10] Removing archive from local storage...
[2024-07-26 11:56:10] Removing '.tar' leftovers...
[2024-07-26 11:56:10] Marking backup as finished...
[2024-07-26 11:56:10] Refreshing disk stats...
[2024-07-26 11:56:10] Notifying 'honeymoose' of the end of the backup...
[2024-07-26 11:56:18] Finished!

The next step is to check if it’s an issue with the database backup table records.

Are you also using R2? Is it displaying successfully?

I am using AWS.

This should be easy to configure.