Does using S3 for uploads mean redownloading all files when creating backups?

Hi, I just set up a new discourse and I have an upload bucket on S3.

But if Discourse is zipping all the files when creating a backup, does that mean that every time a backup is created, Discourse redownloads the entire S3 uploads bucket?

And has anyone had issues with bandwidth limitations in doing so?

2 Likes

I think it refers to local files; files in S3 ought to be backed up separately.

1 Like

That will happen only when you enable the include_s3_uploads_in_backups site setting. It’s disabled by default.

6 Likes

That setting seems to be enabled by default on my website. But regardless, are there any recommendations on how to backup s3 upload buckets efficiently?

This guide for saving backups to s3 and also archiving them to glacier makes sense when the backup is a single zip file. But my understanding of glacier costs is that they charge per file, so costs will go up drastically for unzipped bucket backups.

I’m an AWS newbie, so any advice is appreciated. Thanks!

Edit: alternatively if there’s no great simple answer, I could consider not using s3 for uploads.

2 Likes

That depends on so many factors. How much money do you want to throw at it? For which scenarios would you like to have backups? Software bugs, Amazon datacenters being hit by an asteroid, an evil admin deleting files from S3,…

I’m afraid we can’t help you with that. You need to find the solutions fitting your use case somewhere else. The search engine of you choice is a good starting point.

1 Like

I’ve done some more digging around and talking to people.

I believe that my best bet for this will be to replicate the S3 bucket to another region. I need to do more research into setup and costs.

2 Likes

In our case with a non-discourse site we use awscli to sync buckets (aws s3 sync) between different regions in different accounts, so even if an account was compromised and the bucket deleted, or if an asteroid falls and destroy an Amazon datacenter (hopefully not), we could recover from the other bucket. If you do a sync the costs should not be so high because only new/changed files will be synced.

Well, there is still the case of the 2 accounts being compromised in a short period of time and the buckets deleted or Amazon closing AWS, both very very unlikely. But if something like that occurs you can just play in the lottery, choosing the numbers you think are wrong :slight_smile:

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.