Using Object Storage for Uploads (S3 & Clones)

I just went over some common Object Storage providers so I can attest if they work or not with Discourse.

Provider Service Name Works with Discourse?
Amazon AWS S3 Yes
Digital Ocean Spaces Yes
Linode Object Storage Yes
Google Cloud Storage Yes
Scaleway Object Storage Yes
Vultr Object Storage Yes
BackBlaze Cloud Storage Yes*
Self-hosted MinIO Yes

If you got a different service working, please add it to this wiki.

Configuration

In order to store Discourse static assets in your Object Storage add this configuration on your app.yml under the hooks section:

  after_assets_precompile:
    - exec:
        cd: $home
        cmd:
          - sudo -E -u discourse bundle exec rake s3:upload_assets

When using object storage, you also need a CDN to serve what gets stored in the bucket. I used StackPath CDN in my testing, and other than needing to set Dynamic Caching By Header: Accept-Encoding in their configuration it works ok.

DISCOURSE_CDN_URL is a CDN that points to you Discourse hostname and caches requests. It will be used mainly for pullable assets: CSS and other theme assets.

DISCOURSE_S3_CDN_URL is a CDN that points to your object storage bucket and caches requests. It will be mainly used for pushable assets: JS, images and user uploads.

We recommend those being different and for admins to set both.

In the following examples https://falcoland-files-cdn.falco.dev is a CDN configured to serve the files under the bucket. The bucket name was set to falcoland-files in my examples.

Choose your provider from the list below and add these settings to the env section of your app.yml file, adjusting the values accordingly:

AWS S3

What we officially support and use internally. Their CDN offering Cloudfront also works to front the bucket files. See Setting up file and image uploads to S3 for how to configure the permissions properly.

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: us-west-1
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backups
  DISCOURSE_BACKUP_LOCATION: s3

Digital Ocean Spaces

DO offering is good and works out of the box. Only problem is that their CDN offering is awfully broken, so you need to use a different CDN for the files.

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: whatever
  DISCOURSE_S3_ENDPOINT: https://nyc3.digitaloceanspaces.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backups
  DISCOURSE_BACKUP_LOCATION: s3

Linode Object Storage

An extra configuration parameter, HTTP_CONTINUE_TIMEOUT, is required for Linode.

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: us-east-1
  DISCOURSE_S3_HTTP_CONTINUE_TIMEOUT: 0
  DISCOURSE_S3_ENDPOINT: https://us-east-1.linodeobjects.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backup
  DISCOURSE_BACKUP_LOCATION: s3

GCP Storage

Listing files is broken, so you need an extra ENV to skip that so assets can work. Also skip CORS and configure it manually.

:warning: Since you can’t list files you won’t be able to list backups, and automatic backups will fail, we don’t recommend using it for backups. :warning:

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: us-east1
  DISCOURSE_S3_INSTALL_CORS_RULE: false
  FORCE_S3_UPLOADS: 1
  DISCOURSE_S3_ENDPOINT: https://storage.googleapis.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  #DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backup
  #DISCOURSE_BACKUP_LOCATION: s3

Scaleway Object Storage

Scaleway offering is also very good, and everything works fine.

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: fr-par
  DISCOURSE_S3_ENDPOINT: https://s3.fr-par.scw.cloud
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backups
  DISCOURSE_BACKUP_LOCATION: s3

Vultr Object Storage

An extra configuration parameter, HTTP_CONTINUE_TIMEOUT, is required for Vultr.

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: whatever
  DISCOURSE_S3_HTTP_CONTINUE_TIMEOUT: 0
  DISCOURSE_S3_ENDPOINT: https://ewr1.vultrobjects.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backup
  DISCOURSE_BACKUP_LOCATION: s3

Backblaze B2 Cloud Storage

You need to skip CORS and configure it manually.

There are reports of clean up orphan uploads not working correctly with BackBlaze.

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: "us-west-002"
  DISCOURSE_S3_INSTALL_CORS_RULE: false
  DISCOURSE_S3_CONFIGURE_TOMBSTONE_POLICY: false
  DISCOURSE_S3_ENDPOINT: https://s3.us-west-002.backblazeb2.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backup
  DISCOURSE_BACKUP_LOCATION: s3

MinIO Storage Server

There are a few caveats and requirements you need to ensure are met before you can use MinIO storage server as an alternative to S3:

  1. You have a fully configured MinIO server instance
  2. You have Domain Support enabled in the MinIO configuration
  3. You have DNS configuration properly set up for MinIO so that bucket subdomains properly resolve to the MinIO server and the MinIO server is configured with a base domain (in this case, minio.example.com)
  4. The bucket discourse-data exists on the MinIO server and has a “public” policy set on it
  5. Your S3 CDN URL points to a properly configured CDN pointing to the bucket and cache requests, as stated earlier in this document.
  6. Your CDNs are configured to actually use a “Host” header of the core S3 URL - for example, discourse-data.minio.example.com when it fetches data - otherwise it can cause CORB problems.

Assuming the caveats and prerequisites above are met, an example configuration would be something like this:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: anything
  DISCOURSE_S3_ENDPOINT: https://minio.example.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://discourse-data-cdn.example.com
  DISCOURSE_S3_BUCKET: discourse-data
  DISCOURSE_S3_BACKUP_BUCKET: discourse-backups
  DISCOURSE_BACKUP_LOCATION: s3
  DISCOURSE_S3_INSTALL_CORS_RULE: false

CORS is still going to be enabled on MinIO even if the rule is not installed by the app rebuilder - by default, it seems, CORS is enabled on all HTTP verbs in MinIO, and MinIO does not support BucketCORS (S3 API) as a result

45 Likes
Setting up file and image uploads to S3
Defining DISCOURSE_S3_CDN_URL links to assets in S3 CDN URL
Using Scaleway s3-compatible object storage
Setting up backup and image uploads to DigitalOcean Spaces
Migrate from AWS to Digital Ocean with 2 containers, spaces and 2 CDNs
How to Setup BackBlaze S3 with BunnyCDN
Extend S3 configuration for other s3 API compatible cloud storage solutions
Setting up backup and image uploads to Backblaze B2
What are the right settings to use S3 bucket (with non-Amazon URL)?
Can not access backup page and related error when restoring using GCP Object Storage
High Availability 3 Server setup
Upload assets to S3 after in-browser upgrade
How might we better structure #howto?
Migrating uploaded files from DO to S3
Discourse as a closed wiki
Using multiple containers - what needs to be shared?
Virus scanning of uploaded files
Imgur images broken
Admin role conflates server admin and board admin
Restore a backup from command line
Use WebTorrent to load media objects
Hosting Optimization with Digital Ocean
Hosting Optimization with Digital Ocean
Theme modifiers: A brief introduction
Configure automatic backups for Discourse
Move from BackBlaze B2 to Digital Ocean Spaces
Which free storage for many images? also to be used for thumbnails etc
Disk usage spike during backup, Discourse crashed hard :-(
Migrate from AWS to Digital Ocean with 2 containers, spaces and 2 CDNs
Restore Failure - S3 (compatible) backup
Restore Failure - S3 (compatible) backup
Digitalocean block storage VS amazon S3
Digitalocean block storage VS amazon S3
Setting up backup and image uploads to DigitalOcean Spaces
Custom emoji don't use CDN for S3 stored assets in a few pages
Admin upgrade page doesn't load with a CDN
Install Discourse for Production Environment on Windows Server
Running Discourse on Azure Web Sites vs. Azure VM?
How to turn off S3 storage?
Access Denied error message when trying to upload images
What are the right settings to use S3 bucket (with non-Amazon URL)?
REQ: Support S3 backup to a service like Backblaze
REQ: Support S3 backup to a service like Backblaze
Using Scaleway s3-compatible object storage
Topic List Previews
Overwrite meta og:image image source to use externally public loaded images on topics?
How to store uploads with multiple web_only servers?
Setting up backup and image uploads to DigitalOcean Spaces
Comparing Discourse for Teams with Discourse
Finding UI generated backup and restoring site
Looking for doc to connect discourse with digital ocean spaces
Looking for doc to connect discourse with digital ocean spaces
Looking for doc to connect discourse with digital ocean spaces
403 Error with digital ocean cdn
Link to headers (anchor links)
CSP Errors with S3 configuration when s3_cdn_url has trailing slash
NoMethodError downcase s3_bucket_name absolute_base_url
Setting up backup and image uploads to Backblaze B2
Backing up files in Object Storage
Minio: A header you provided implies S3 functionality that is not implemented
Moving uploads and backups to DigitalOcean Block Storage
Configure automatic backups for Discourse
S3 OVH Object Storage
What should I enter in the S3 CDN settings if I don't have a CDN?
SMF2 Conversion and Rake to S3 Help
What causes rake uploads:fix_relative_upload_links
Running 2 hosts behind haproxy fails with random 404s
Site Blank After Rebuild
Backblaze S3 issue: duplicated uploads after delete
Migrate_to_S3 Fails on Rebake
Downloads coming from S3 even with DISCOURSE_S3_CDN_URL set
Moving from one S3 bucket to another
S3 image bandwidth costs are getting annoying
Basic How-To for Using MinIO storage server run by you for your Discourse Instance
Digital Ocean Spaces don’t implement the AWS S3 API for the CORS rule
Extend S3 configuration for other S3 API compatible services
Decoupled Discourse Application - Managed Redis, Managed Postgres, and DIgital Ocean Volume with Discourse
Digital ocean CDN setup
Migrate_to_S3 Fails on Rebake
Cannot rebake after setting up CDN

The answer is yes.

To test this theory, before rebaking hundreds of thousands of posts, I did the following sanity checks:

  • Uploaded an image
  • Changed the S3 CDN URL setting
  • Rebuilt the HTML on my test post (via the UI)
  • Refreshed the page in the browser
  • Checked the Network tab of the browser console to confirm the image was being pulled via cloudfront
  • Uploaded a new test image to a new post
  • Checked the Network tab of the browser console to confirm the image was being pulled via cloudfront

I’m now rebaking all posts as we speak :+1:t2:

9 Likes

Thanks for the report Richie. I also have had AWS S3 image storage running for several years and came to this post via the console message. But the description at the top doesn’t say anything about the case that you already had S3 and just need a CDN.

For the record here is what I did:

  1. Went to AWS console, under Network and Content Delivery picked Cloudfront
  2. Clicked the Create distribution button
  3. Filled out the fairly obvious form, the only thing you really need to do on it is to pick your AWS S3 bucket where the images are from the drop-down menu.
  4. Waited a bit for the Cloudfront configuration to finish…
  5. A <gibberish>.cloudfront.net domain showed up in the “Domain Name” column of the Cloudfront Distributions list.
  6. I copied and pasted that domain into the s3 cdn url field in my site admin File settings.
  7. I did some tests:
    a. I made a new post with an image upload and indeed it was on cloudfront.
    b. I hit Rebuild HTML on some random existing image posts and saw they also rebuilt with cloudfront.net images.
  8. Since all looked good I went in and ran a rebake, which took several hours as I have around half a million posts now:
./launcher enter app
# rake posts:rebake
  1. All seems to be working fine. It put a ton of jobs in the sidekiq queue, one per post it looks like, which are going to take a few days to clear but it is chunking though them now.
8 Likes

Are you sure that is the case? This site here uses assets from CDN and we didn’t have to purge the cache. It’s also a EmberCli change that shouldn’t affect production :thinking:

5 Likes

I updated yesterday with rebuild and the forum was blank until I purge cache. I use BunnyCDN with Digital ocean Spaces. Before I Purge caches I checked the inspect elements there was some error scripts something (start.js, etc) sorry I didn’t screenshot it :confused:

1 Like

I just updated with a rebuild and I now have a blank page as well. Forgive me for asking, I am just beginning to learn this. How do I purge the cache? Is this via discourse or via the CDN? I use AWS Cloudfront.

1 Like

Can you check the inspect element console? Is there any error? Not sure this is the same problem.

2 Likes

So I think I figured out what I was doing wrong. I had set both CDN and S3 CDN. I did not realize that setting S3 CDN when using AWS will cause both uploads and assets to hit the same distribution. So, having both set is problematic. This was not clear to me when following the guide as it notes that both are separate and both should be set. I noticed the assets folder created in my uploads bucket however, and this is what helped me figure it out.

Can anyone confirm this is correct?

3 Likes

Can we improve the guide or the settings to alleviate the above confusion :point_up_2: @falco / @gerhard ?

4 Likes

I’ve just been setting up S3 for file uploads on our install and wanted to clarify how this works with files posted in private categories. From what I can see, the URLs used for uploaded assets in private categories are still world-readable, and that any security is through obscurity (although admittedly filenames appear to be given uuid based names). Is that right?

Is there a way to set discourse up to use AWS url-signing? I’m previously familiar with Django-storages which does this using pre-signed urls. These are documented here: Authenticating Requests: Using Query Parameters (AWS Signature Version 4) - Amazon Simple Storage Service

I have previously used this and set the X-Amz-Expires parameter to quite a low value — perhaps a 60 seconds, which means that it’s not possible to repost the url to private assets.

I’m aware this doesn’t prevent a logged-in user resharing a downloaded asset, but it does mitigate against certain modes of data leakage.

1 Like

What you want is Secure Media Uploads

4 Likes

Thank you. Sorry I missed this.

2 Likes

Hello Rafael,

I contacted with Bunny.net and this probably caused by Bunny Optimizer. This not respect the origin cache when the optimizer turned on. They will fix it! :slightly_smiling_face:

3 Likes

Oh those damn optimizers. I would tell you to disable that if possible as Discourse already ships optimal configurations for each asset. Those optimizers are great when you are hosting a black box web software from the 2000’s, but they fail hard on modern stuff. Also even the big name optimizer from Cloudflare often breaks Discourse, so I have no hopes for others. They may work one day, and break the next and all your visitors are left with a blank page. All that for zero benefits.

4 Likes

Thank you! :slightly_smiling_face: Yes, I disabled it for js and css. I only use it now for the uploaded images optimization on CDN. Which is my main reason to use this optimizer.

2 Likes

A follow-up from Using Scaleway s3-compatible object storage - #28 by Falco

In the admin UI, it is not possible to configure the region to anything else besides the AWS regions (i.e. a custom region - as Scaleway would require according to this guide, for instance) as of the latest version (2.7.0.beta5). Not sure if this is fixed in master, but the configuration here can’t be applied via the admin UI - only via env vars, rails c, direct db update or code change.
The error handling at the admin UI is also kind of broken - it just fails with 500, could provide a better error message.

2 Likes

I think that’s true. You should follow the instructions in the OP.

1 Like