Configure automatic backups for Discourse

Automatic Backups on Backblaze B2

here’s how i have it set up for a hypothetical site hosted on example.com

  1. make an account on backblaze (atm, no need to enter payment for <10GB which is free)
  2. create a bucket (backblaze > B2 Cloud Storage)
    • name: $sitename-discourse-$random padded to 30char
      • in this example: example-discourse-g87he56ht8vg
      • discourse needs bucket name to be lowercase letters, numbers, and dashes only
      • i suggest keeping it 30 char or less since that shows up nicely in backblaze’s webui without wrapping
    • private bucket
    • enable encryption (SSE-B2)
    • enable object lock
  3. create an application key (backblaze > account > app keys)
    • keyName: example-discourse
    • bucketName (Allow access to Bucket(s)): example-discourse-g87he56ht8vg
    • capabilities: read and write
    • leave namePrefix and validDurationSeconds blank
  4. configure discourse B2 settings (discourse > admin > settings)
    • backup_location: s3
    • s3_backup_bucket: example-discourse-g87he56ht8vg
    • s3_endpoint: this is shown on the bucket page – make sure to prepend with https://
    • s3_access_key_id: (from previous step)
    • s3_secret_access_key: (from previous step)
      • backblaze only shows you the key once (at creation)!
    • btw, you can also set these as env vars in your container yml instead. this would let you restore with only that file and nothing else:
env:
  ## Backblaze B2 Backups
  # DISCOURSE_BACKUP_LOCATION: 's3' # uncomment to recover from cli
  DISCOURSE_S3_ENDPOINT: 'https://....backblazeb2.com'
  DISCOURSE_S3_BACKUP_BUCKET: 'example-discourse-g87he56ht8vg'
  DISCOURSE_S3_ACCESS_KEY_ID: '...'
  DISCOURSE_S3_SECRET_ACCESS_KEY: '...'
  # DISCOURSE_DISABLE_EMAILS: 'non-staff' # uncomment to disable email during a test restore
  ## you can restore with no data beyond this container yml.
  ## uncomment DISCOURSE_BACKUP_LOCATION above, build container (./launcher rebuild ...),
  ## and then run this inside container (it will restore from B2 bucket):
  ##   discourse enable_restore
  ##   discourse restore <example-com-...tar.gz> # choose restore filename by browsing B2 webui
  ## remember to disable restore afterwards
  1. configure backup retention
    • discourse:
      • backup_frequency: 1 (daily backups in this example, but you could do weekly)
      • maximum_backups: disregard this setting – let backblaze handle it :sunglasses:
      • s3_disable_cleanup: true (Prevent removal of old backups from S3 when there are more backups than the maximum allowed)
    • backblaze (go to your bucket’s settings):
      • Object Lock (Default Retention Policy): 7 days
      • Lifecycle Settings (custom):
        • fileNamePrefix: default/example-com (optional)
        • daysFromUploadingToHiding: 8 days
          • this should be object lock + 1
        • daysFromHidingToDeleting: 1 day

to summarize retention in this example:

  • discourse creates backups every 1 day
  • each backup file is immutable for 7 days after upload to B2 (object lock). this protects you against accidents, ransomware, etc.
  • 8 days after upload, the object lock on the backup expires. since it’s mutable again, a lifecycle rule can hide the backup file
  • the next part of the lifecycle rule deletes any file 1 day after it’s hidden

so you get daily backups. retention time is one week during which backups can’t be deleted no matter what. then backups are deleted 2 days later. so really a backup lives for 9 days or so.

hope that helps someone :slight_smile:


on second thought, maybe it’s better to let discourse handle retention (maximum_backups). that way, your backups won’t automatically start expiring if discourse is down. you wouldn’t want a clock ticking on them while trying to recover. if you went that way, you could set maximum_backups=8 and s3_disable_cleanup=false in this example and not use a lifecycle policy in B2. you would still use the object lock policy (7 days), though.

edit: actually, i think you do still need a B2 lifecycle policy because i think files only get ‘hidden’ and not deleted when an S2 client deletes them. i’m using the “Keep only the last version of the file” policy, which is equivalent to daysFromHidingToDeleting=1, daysFromUploadingToHiding=null.

i guess think it over and decide which approach is right for you.

btw, i realize there’s some back in forth in this post. i think it’s informative as-is, but if someone wants, i could make another slightly simpler post with my actual recommendations.

6 Likes