Configure automatic backups for Discourse

Thanks for the hint. That pushed me towards the command line option which we can schedule to do whenever: :+1:

2 Likes

I got this to work, but it seems that the uploads checkbox was not really needed, nor do I understand the purpose. What is the purpose? The only thing I want is backups to s3 instead of local for my server. The server only has weekly automatic backupsā€¦

The Json also had problemsā€¦ I was able to get it to work using another website reference. However, nobody could upload any images because I had the uploads checkbox checked (as described here)ā€¦Unchecking that box fixed the image upload problem for users and their profile pics.

What is the purpose of the images upload? Iā€™m seriously hoping images are inside the s3 backups.
I had to do the instructions twice because I didnā€™t understand ā€œuploadsā€ and only made one bucket. Then I had to do it again with 2 buckets, and then I had to remove the checkbox for uploads. It might be good if there was a separate more simple topic for s3 backups ā€¦ and only backups.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:List*",
                "s3:Get*",
                "s3:AbortMultipartUpload",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:PutObjectVersionAcl",
                "s3:PutLifecycleConfiguration",
                "s3:CreateBucket",
                "s3:PutBucketCORS"
            ],
            "Resource": [
                "arn:aws:s3:::classicaltheravadabucket",
                "arn:aws:s3:::classicaltheravadabucket/*",
                "arn:aws:s3:::classicaltheravadabackupbucket",
                "arn:aws:s3:::classicaltheravadabackupbucket/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:*"
            ],
            "Resource": "*"
        }
    ]
}
2 Likes

Though I think that topic should be updated to recommend that the S3 configuration be moved to app.yml rather than the database, that way you can do a command line restore of the database with only the yml file and not have to configure it with a user and s3 confige before doing a restore.

1 Like

Iā€™m not sure what you are talking about . My backups are working see picture
.
I use s3 because digital ocean backups are only weekly, and if the server crashes and gets deleted, it is not useful.
On the other hand. Iā€™m hoping that restoring from s3 or downloaded s3 bucket will be fine.
Iā€™m not uploading the images, and I am hoping that the s3 backups are being backed up including images (although very few).

Generally: no.
It does not make much sense to backup images in an S3 bucket into another S3 bucket?

2 Likes

can you be less ambiguous?
The instructions had 2 s3 buckets. I could not get that to work.
I have only one s3 bucket. Hopefully pictures are included in that backupā€¦ is that correct?

I would imagine that local backups work the same, Right?

Please answer in full sentences regarding my questions. The tutorial was very confusing as well.

1 Like

What is ambiguous about ā€œnoā€ ? (And whatā€™s not ambiguous about ā€œbackups being backed upā€ :wink: )

Let me try again.

If you have configured uploads to be on S3 then uploads are not included in your backup.

2 Likes

Letā€™s use the term ā€œpicturesā€ instead of uploads even though it can be other media.
This way we donā€™t confuse the text content as an upload which Iā€™m uploading to s3.

So the 62 MB, backup files on s3 as pictured and uploaded in this thread does not include pictures ?

So how do I make sure the backups have these?
Do the local backups have the pictures too?

When I configured the s3 for "uploads (of media) " which was ambiguous as well. Nobody could post pictures because they were rejected from s3ā€¦

Is there a way to have both local and s3 daily backups?
I could care less if 5 days of pictures were lost, we are mostly a text based group.
But I would care if 5 days of text were lost. Digital Ocean only does 7-day backups if you pay them.
So even though I can backup daily, if the droplet gets hacked or damaged, then we lose those backupsā€¦ Iā€™m starting to think there is not much added value in s3.

I wish there were simple backups similar to WordPress which lets me backup to my google or dropbox account.

No, that is a bad idea, if you upload a text file as an attachment, itā€™s an upload as well, it will cause confusion. And text in a post is stored into the database. So Iā€™m sticking to the term uploads.

If your uploads are on S3, they are not included in backups. In that case the backups only contain a copy of the database. It does not matter whether your backups are local or on S3.

If your uploads are not on S3, they are included in the backups. In that case the backups contain a copy of the database, and a copy of the uploads. It does not matter whether your backups are local or on S3.

If you are storing something on S3, be it uploads or backups of the database, they will not get lost if your DO droplet gets hacked or damaged. So I donā€™t see your point.

Since your posts are about backups and not about file and image uploads, Iā€™m moving these to another topic.

3 Likes

Iā€™d like to automatically move my S3 Backups to Glacier but Iā€™m confused by the steps linked in the first post, which doesnā€™t explain much, maybe because thereā€™s outdated stuff.

image

Which options should be checked here? :thinking:

May I ask again in case someone did these steps and knows about it?

Also, do you know what makes these fluctuations in S3 fees?

Plus, since the launch of the forum (September 2020), the size of the backups has increased by roughly 15%, but the S3 bills have doubled, from 2.50$ to 5$. Any idea why that much?

Thatā€™s why Iā€™d like to use Glacier.


Edit: Iā€™ve followed the steps described here and Iā€™ll see how it goes.

1 Like

Well, it doesnā€™t go. :sweat_smile:

My lifecycle configuration:

My S3 bucket:

No backup is on Glacier.

Soā€¦ Two questions for those who have been able to achieve this automated S3 to Glacier transition:

  1. What could be wrong in my configuration?

  2. The minimum storage duration charge in Glacier is 90 days. Does that mean that if I do 1 backup a day, Iā€™ll be eventually charged for 90 backups in Glacier each month?
    If this is the case, then this Glacier solution wonā€™t be a good idea, unless I reduce a lot my backups frequency.

1 Like

where in the vps are the backups stored?

1 Like

I added this to the OP:

2 Likes

Can we chose the folder of the backups or there is a workaround without coding?

Iā€™m using an storage data from my hosting provider so I can mount and use like local but it not supposed to be saved into default path.

1 Like

If you want it to be saved in a different place then youā€™d need to change that in your app.yml

2 Likes

Automatic Backups on Backblaze B2

hereā€™s how i have it set up for a hypothetical site hosted on example.com

  1. make an account on backblaze (atm, no need to enter payment for <10GB which is free)
  2. create a bucket (backblaze > B2 Cloud Storage)
    • name: $sitename-discourse-$random padded to 30char
      • in this example: example-discourse-g87he56ht8vg
      • discourse needs bucket name to be lowercase letters, numbers, and dashes only
      • i suggest keeping it 30 char or less since that shows up nicely in backblazeā€™s webui without wrapping
    • private bucket
    • enable encryption (SSE-B2)
    • enable object lock
  3. create an application key (backblaze > account > app keys)
    • keyName: example-discourse
    • bucketName (Allow access to Bucket(s)): example-discourse-g87he56ht8vg
    • capabilities: read and write
    • leave namePrefix and validDurationSeconds blank
  4. configure discourse B2 settings (discourse > admin > settings)
    • backup_location: s3
    • s3_backup_bucket: example-discourse-g87he56ht8vg
    • s3_endpoint: this is shown on the bucket page ā€“ make sure to prepend with https://
    • s3_access_key_id: (from previous step)
    • s3_secret_access_key: (from previous step)
      • backblaze only shows you the key once (at creation)!
    • btw, you can also set these as env vars in your container yml instead. this would let you restore with only that file and nothing else:
env:
  ## Backblaze B2 Backups
  # DISCOURSE_BACKUP_LOCATION: 's3' # uncomment to recover from cli
  DISCOURSE_S3_ENDPOINT: 'https://....backblazeb2.com'
  DISCOURSE_S3_BACKUP_BUCKET: 'example-discourse-g87he56ht8vg'
  DISCOURSE_S3_ACCESS_KEY_ID: '...'
  DISCOURSE_S3_SECRET_ACCESS_KEY: '...'
  # DISCOURSE_DISABLE_EMAILS: 'non-staff' # uncomment to disable email during a test restore
  ## you can restore with no data beyond this container yml.
  ## uncomment DISCOURSE_BACKUP_LOCATION above, build container (./launcher rebuild ...),
  ## and then run this inside container (it will restore from B2 bucket):
  ##   discourse enable_restore
  ##   discourse restore <example-com-...tar.gz> # choose restore filename by browsing B2 webui
  ## remember to disable restore afterwards
  1. configure backup retention
    • discourse:
      • backup_frequency: 1 (daily backups in this example, but you could do weekly)
      • maximum_backups: disregard this setting ā€“ let backblaze handle it :sunglasses:
      • s3_disable_cleanup: true (Prevent removal of old backups from S3 when there are more backups than the maximum allowed)
    • backblaze (go to your bucketā€™s settings):
      • Object Lock (Default Retention Policy): 7 days
      • Lifecycle Settings (custom):
        • fileNamePrefix: default/example-com (optional)
        • daysFromUploadingToHiding: 8 days
          • this should be object lock + 1
        • daysFromHidingToDeleting: 1 day

to summarize retention in this example:

  • discourse creates backups every 1 day
  • each backup file is immutable for 7 days after upload to B2 (object lock). this protects you against accidents, ransomware, etc.
  • 8 days after upload, the object lock on the backup expires. since itā€™s mutable again, a lifecycle rule can hide the backup file
  • the next part of the lifecycle rule deletes any file 1 day after itā€™s hidden

so you get daily backups. retention time is one week during which backups canā€™t be deleted no matter what. then backups are deleted 2 days later. so really a backup lives for 9 days or so.

hope that helps someone :slight_smile:


on second thought, maybe itā€™s better to let discourse handle retention (maximum_backups). that way, your backups wonā€™t automatically start expiring if discourse is down. you wouldnā€™t want a clock ticking on them while trying to recover. if you went that way, you could set maximum_backups=8 and s3_disable_cleanup=false in this example and not use a lifecycle policy in B2. you would still use the object lock policy (7 days), though.

edit: actually, i think you do still need a B2 lifecycle policy because i think files only get ā€˜hiddenā€™ and not deleted when an S2 client deletes them. iā€™m using the ā€œKeep only the last version of the fileā€ policy, which is equivalent to daysFromHidingToDeleting=1, daysFromUploadingToHiding=null.

i guess think it over and decide which approach is right for you.

btw, i realize thereā€™s some back in forth in this post. i think itā€™s informative as-is, but if someone wants, i could make another slightly simpler post with my actual recommendations.

6 Likes

If you put those in environment variables as described in Configure an S3 compatible object storage provider for uploads then you can restore your site to a new server from the command line with only your yml file.

The rest seems like a good plan.

3 Likes

discourse restore <backup.tar.gz>

this will look in your bucket if you have the env vars set? pretty cool if so.

and in that case, you could probably also set them manually with export in bash in the unlikely event that you have to recover. that is, if you donā€™t want to keep secrets in your container yml for some reason.

1 Like

Just for confirmation, once I have moved to S3 backups and tested that they works, can I safely delete the content of that folder to reclaim used space?