Configure an S3 compatible object storage provider for uploads

:information_source: This topic covers how to configure some common S3 compatible Object Storage providers (S3 clones). See Set up file and image uploads to S3 for more details about Amazon AWS S3 configuration, which is officially supported and used internally by Discourse for our hosting services.

Provider Service Name Works with Discourse?
Amazon AWS S3 Yes
Digital Ocean Spaces Yes
Linode Object Storage Yes
Google Cloud Storage Yes
Scaleway Object Storage Yes
Vultr Object Storage Yes
BackBlaze Cloud Storage Yes*
Self-hosted MinIO Yes
Azure Blob Storage Flexify.IO Yes
Oracle Cloud Object Storage No [1]
Wasabi Object Storage Maybe
Cloudflare R2 No
Contabo Object Storage No

If you got a different service working, please add it to this wiki.

Configuration

In order to store Discourse static assets in your Object Storage add this configuration on your app.yml under the hooks section:

  after_assets_precompile:
    - exec:
        cd: $home
        cmd:
          - sudo -E -u discourse bundle exec rake s3:upload_assets
          - sudo -E -u discourse bundle exec rake s3:expire_missing_assets

When using object storage, you also need a CDN to serve what gets stored in the bucket. I used StackPath CDN in my testing, and other than needing to set Dynamic Caching By Header: Accept-Encoding in their configuration it works ok.

DISCOURSE_CDN_URL is a CDN that points to you Discourse hostname and caches requests. It will be used mainly for pullable assets: CSS and other theme assets.

DISCOURSE_S3_CDN_URL is a CDN that points to your object storage bucket and caches requests. It will be mainly used for pushable assets: JS, images and user uploads.

We recommend those being different and for admins to set both.

Not using a CDN (or entering the bucket URL as the CDN URL) is likely to cause problems and is not supported.

In the following examples https://falcoland-files-cdn.falco.dev is a CDN configured to serve the files under the bucket. The bucket name was set to falcoland-files in my examples.

Configuring these settings in environment variables in your app.yml is recommended because it’s how CDCK does it in their infrastructure, so it’s well-tested. Also, the task to upload assets happen after assets are compiled, which happens in a rebuild. If you want to spin a Discourse that works properly with Object Storage since the beginning you need to set the env vars so the assets are uploaded before the site starts.

Choose your provider from the list below and add these settings to the env section of your app.yml file, adjusting the values accordingly:

AWS S3

What we officially support and use internally. Their CDN offering Cloudfront also works to front the bucket files. See Set up file and image uploads to S3 for how to configure the permissions properly.

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: us-west-1
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backups
  DISCOURSE_BACKUP_LOCATION: s3

Digital Ocean Spaces

DO offering is good and works out of the box. It is fine to enable Restrict File Listing. Only problem is that their CDN offering is awfully broken, so you need to use a different CDN for the files. Also, you need not to install the CORS rule, as it re-installs it at every rebuild.

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: whatever
  DISCOURSE_S3_ENDPOINT: https://nyc3.digitaloceanspaces.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backups
  DISCOURSE_BACKUP_LOCATION: s3
  DISCOURSE_S3_INSTALL_CORS_RULE: false 

Linode Object Storage

An extra configuration parameter, HTTP_CONTINUE_TIMEOUT, is required for Linode.

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: us-east-1
  DISCOURSE_S3_HTTP_CONTINUE_TIMEOUT: 0
  DISCOURSE_S3_ENDPOINT: https://us-east-1.linodeobjects.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backup
  DISCOURSE_BACKUP_LOCATION: s3

Google Cloud Platform Storage

Listing files is broken, so you need an extra ENV to skip that so assets can work. Also skip CORS and configure it manually.

:warning: Since you can’t list files you won’t be able to list backups, and automatic backups will fail, we don’t recommend using it for backups. However, some suggest that if you change the role from Storage Legacy Object Owner to Storage Legacy Bucket Owner backups do work correctly. See this topic for Google Cloud specific discussion.

There is a third-party plugin to make the integration better at Discourse GCS Helper.

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: us-east1
  DISCOURSE_S3_INSTALL_CORS_RULE: false
  FORCE_S3_UPLOADS: 1
  DISCOURSE_S3_ENDPOINT: https://storage.googleapis.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  #DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backup
  #DISCOURSE_BACKUP_LOCATION: s3

Scaleway Object Storage

Scaleway offering is also very good, and everything works fine for the most part.

:warning: Scaleway multipart uploads only support a maximum of 1,000 parts. This does not match Amazon S3, which supports a maximum of 10,000 parts. For larger instances, this will cause Discourse backups to fail and the incomplete upload may need to be manually deleted before further attempts are made. For small instances this is no issue however. Scaleway seem quite open to feedback, so if you want this limit changed you should contact them.

Note that for the DISCOURSE_S3_ENDPOINT parameter, Discourse uses the endpoint of the whole region: https://s3.{region}.scw.cloud. The “Bucket endpoint” found in your Scaleway dashboard comes in the form https://{bucketName}.s3.{region}.scw.cloud. Omit the bucket name subdomain to prevent connection errors.

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: fr-par
  DISCOURSE_S3_ENDPOINT: https://s3.fr-par.scw.cloud
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backups
  DISCOURSE_BACKUP_LOCATION: s3

Vultr Object Storage

An extra configuration parameter, HTTP_CONTINUE_TIMEOUT, is required for Vultr.

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: whatever
  DISCOURSE_S3_HTTP_CONTINUE_TIMEOUT: 0
  DISCOURSE_S3_ENDPOINT: https://ewr1.vultrobjects.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backup
  DISCOURSE_BACKUP_LOCATION: s3

Backblaze B2 Cloud Storage

You need to skip CORS and configure it manually.

There are reports of clean up orphan uploads not working correctly with BackBlaze. You must change lifecycle rules for your bucket for orphan cleanup to work.

Example configuration:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: "us-west-002"
  DISCOURSE_S3_INSTALL_CORS_RULE: false
  DISCOURSE_S3_CONFIGURE_TOMBSTONE_POLICY: false
  DISCOURSE_S3_ENDPOINT: https://s3.us-west-002.backblazeb2.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://falcoland-files-cdn.falco.dev
  DISCOURSE_S3_BUCKET: falcoland-files
  DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backup
  DISCOURSE_BACKUP_LOCATION: s3

Note: During initial migration to B2, you may hit the 2500 free daily class C transactions limit. You will need to add a payment method to remove caps.

MinIO Storage Server

There are a few caveats and requirements you need to ensure are met before you can use MinIO storage server as an alternative to S3:

  1. You have a fully configured MinIO server instance
  2. You have Domain Support enabled in the MinIO configuration, for Domain driven bucket URLs. This is a mandatory setup requirement for MinIO and Discourse, as MinIO still supports the legacy S3 “path” styles which are no longer supported in Discourse.
  3. You have DNS configuration properly set up for MinIO so that bucket subdomains properly resolve to the MinIO server and the MinIO server is configured with a base domain (in this case, minio.example.com)
  4. The bucket discourse-data exists on the MinIO server and has a “public” policy set on it
  5. Your S3 CDN URL points to a properly configured CDN pointing to the bucket and cache requests, as stated earlier in this document.
  6. Your CDNs are configured to actually use a “Host” header of the core S3 URL - for example, discourse-data.minio.example.com when it fetches data - otherwise it can cause CORB problems.

Assuming the caveats and prerequisites above are met, an example configuration would be something like this:

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: anything
  DISCOURSE_S3_ENDPOINT: https://minio.example.com
  DISCOURSE_S3_ACCESS_KEY_ID: myaccesskey
  DISCOURSE_S3_SECRET_ACCESS_KEY: mysecretkey
  DISCOURSE_S3_CDN_URL: https://discourse-data-cdn.example.com
  DISCOURSE_S3_BUCKET: discourse-data
  DISCOURSE_S3_BACKUP_BUCKET: discourse-backups
  DISCOURSE_BACKUP_LOCATION: s3
  DISCOURSE_S3_INSTALL_CORS_RULE: false

CORS is still going to be enabled on MinIO even if the rule is not installed by the app rebuilder - by default, it seems, CORS is enabled on all HTTP verbs in MinIO, and MinIO does not support BucketCORS (S3 API) as a result.

Azure Blob Storage with Flexify.IO

Azure Blob Storage is not an S3-compatible service, so it cannot be used with Discourse. There is a plugin, but it is broken.

The easiest way to expose an S3-compatible interface for Azure Blob Storage is to add a Flexify.IO server which translates the Azure Storage protocol into S3.

As of this writing, the service is free on Azure, and you only need a very basic (cheap) VM tier to start running it. It does, however, require a bit of setup.

  1. In Azure portal, create a new resource of Flexify.IO - Amazon S3 API for Azure Blob Storage.
  2. For light usage, the minimum VM config seems to work just fine. You can accept most of the default config. Remember to save the PEM key file when you create the VM.
  3. Browse to the Flexify.IO VM link, and enter the system. Follow the instructions by setting up the Azure Blob Storage data provider and the generated S3 end-point. Make sure that the endpoint config setting Public read access to all objects in virtual buckets is true. Copy the S3 end-point URL and keys.
  4. Press New Virtual Bucket and create a virtual bucket. It can be the same name as your Azure Blob Storage container, or it can be a different name. Link any container(s) to merge into this virtual bucket. This virtual bucket is used to expose a publicly-readable bucket via S3.
  5. By default, Flexify.IO installs a self-signed SSL certificate, while an S3 endpoint requires HTTPS. SSH into the VM using the key file (the username is by default azureuser), and replace the following files with the correct files:
  • /etc/flexify/ssl/cert.pem - replace with certificate file (PEM encoding)

  • /etc/flexify/ssl/key.pem - replace with private key file (PKCS#8 PEM encoding, that’s the one starting with BEGIN PRIVATE KEY and not BEGIN RSA PRIVATE KEY which is PKCS#1)

    These files are root so you’d have to sudo to replace them. It is best to make sure that the replacement files have the same ownership and permissions as the original ones, which means root:root and 600 permission.

  1. By default, Flexify.IO creates a root-level S3 service with multiple buckets. Discourse requires sub-domain support for buckets. Go to: <your Flexify.IO VM IP>/flexify-io/manage/admin/engines/configs/1 which will open up a hidden config page!
  2. Specify the S3 base domain (say it is s3.mydomain.com) in the Endpoint hostname field, which should be blank by default. Press Save to save the setting.
  3. Restart the Flexify.IO VM in Azure portal.
  4. In your DNS, map s3.mydomain.com and *.s3.mydomain.com to the Flexify.IO VM IP.
  5. In Discourse, set the following in the admin page (yes, there is no need for the settings to be in app.yml):
use s3: true
s3 region: anything
s3 endpoint: https://s3.mydomain.com
s3 access key: myaccesskey
s3 secret assess key: mysecret key
s3 cdn url: https://<azure-blob-account>.blob.core.windows.net/<container>
s3 bucket: <virtual bucket>
s3 backup bucket: <backup bucket>  (any container will do, as it does not require public read access and Flexify.IO will expose them automatically)
backup location: s3

Using the same bucket for production and staging is not recommended. If you do it anyway, take measures to see that your staging site doesn’t delete your production assets (set s3 disable cleanup as a minimum, and look out for it deleting production’s backups).

Wasabi

@pfaffman tried wasabi for backups, but it seemed to fail intermittently and silently, leaving backups on the hard drive and eventually filling the disk. Neither wasabi nor meta had any clues, so I don’t recommend it, though your mileage may vary. @pfaffman is now fairly certain that this problem was due to backups and automatic reboots somehow being scheduled at the same time; it was used only for backups, but seemed to work fine. If someone wants to give it a try and report here, it should work, at least for backups.

Oracle Cloud

Oracle Cloud lacks support for virtual-host style access to buckets and will not work

Cloudflare

Cloudflare’s offering is incompatible. In testing, @fearlessfrog filed a ticket with Cloudflare and in December 2022 they said:

Contabo

@tuxed tried to get Contabo Object Storage to work for S3 Compatible uploads. It seems that when uploading it prefixes the repository name in the url and he wasn’t able to get it to work.

Secure Uploads

Secure uploads are supported only for AWS S3. If your rake uploads:migrate_to_s3 fails you should enter these commands to first count and then mark as un-secure those uploads, given that you know that they do not need to be secure, in which case, you’ll need to use AWS S3.

./launcher enter app
rails c
Upload.where(secure: true).count
Upload.where(secure: true).update_all(secure:false)

  1. Oracle Cloud lacks support for virtual-host style access to buckets and will not work ↩︎

65 Likes

Hi everyone,

I’ve been using S3 storage for a number of years now without a CDN.

Following the advice given to me in another thread I have today setup CloudFront CDN.

Before I add the CDN URL to my control panel and rebake 230,000+ posts only to find out I’ve got a CloudFront setting wrong and break everything, can someone confirm this is the expected behaviour for me please? :bowing_man:t2:

Currently, this is an example URL that for an image that a user has uploaded:

https://greyarrows.s3.dualstack.eu-west-2.amazonaws.com/original/3X/8/3/8335cab232f512f4a979c7f0c8562e149c01b212.png

Which displays:

My CloudFront “Domain Name” is: d1q8cepst0v8xp.cloudfront.net

If I manually edit my example URL above and replace the existing S3 part of the domain name with the domain name of my CloudFront domain name, I get:

https://d1q8cepst0v8xp.cloudfront.net/original/3X/8/3/8335cab232f512f4a979c7f0c8562e149c01b212.png

And sure enough, the image still loads correctly:

Therefore, am I correct in thinking I simply need to add a S3 CDN URL of d1q8cepst0v8xp.cloudfront.net` to my Discourse control panel, rebake all posts and just sit back and wait of the magic to happen?

Thanks in advance, CDN is all new to me and I don’t have a development environment in which to safely test this :grimacing:

4 Likes

I also have the s3 configure tombstone policy setting enabled:

Will this be an issue? As I’m now using a CDN instead? Or are things in the background still checking the original S3 bucket, rather than a CDN URL?

I’m naturally thinking the latter, but again I can’t afford to kill of hundreds of thousands of my users photo uploads :scream:

:blush:

2 Likes

The answer is yes.

To test this theory, before rebaking hundreds of thousands of posts, I did the following sanity checks:

  • Uploaded an image
  • Changed the S3 CDN URL setting
  • Rebuilt the HTML on my test post (via the UI)
  • Refreshed the page in the browser
  • Checked the Network tab of the browser console to confirm the image was being pulled via cloudfront
  • Uploaded a new test image to a new post
  • Checked the Network tab of the browser console to confirm the image was being pulled via cloudfront

I’m now rebaking all posts as we speak :+1:t2:

13 Likes

Thanks for the report Richie. I also have had AWS S3 image storage running for several years and came to this post via the console message. But the description at the top doesn’t say anything about the case that you already had S3 and just need a CDN.

For the record here is what I did:

  1. Went to AWS console, under Network and Content Delivery picked Cloudfront
  2. Clicked the Create distribution button
  3. Filled out the fairly obvious form, the only thing you really need to do on it is to pick your AWS S3 bucket where the images are from the drop-down menu.
  4. Waited a bit for the Cloudfront configuration to finish…
  5. A <gibberish>.cloudfront.net domain showed up in the “Domain Name” column of the Cloudfront Distributions list.
  6. I copied and pasted that domain into the s3 cdn url field in my site admin File settings.
  7. I did some tests:
    a. I made a new post with an image upload and indeed it was on cloudfront.
    b. I hit Rebuild HTML on some random existing image posts and saw they also rebuilt with cloudfront.net images.
  8. Since all looked good I went in and ran a rebake, which took several hours as I have around half a million posts now:
./launcher enter app
# rake posts:rebake
  1. All seems to be working fine. It put a ton of jobs in the sidekiq queue, one per post it looks like, which are going to take a few days to clear but it is chunking though them now.
18 Likes

Are you sure that is the case? This site here uses assets from CDN and we didn’t have to purge the cache. It’s also a EmberCli change that shouldn’t affect production :thinking:

5 Likes

Oh those damn optimizers. I would tell you to disable that if possible as Discourse already ships optimal configurations for each asset. Those optimizers are great when you are hosting a black box web software from the 2000’s, but they fail hard on modern stuff. Also even the big name optimizer from Cloudflare often breaks Discourse, so I have no hopes for others. They may work one day, and break the next and all your visitors are left with a blank page. All that for zero benefits.

6 Likes

Any chance you enabled secure_uploads in the site settings?

Also looks like this was reported and fixed today because of Discourse compatibility :

6 Likes

Is there a way to disable this warning from showing up on my admin dashboard?

The server is configured to upload files to S3, but there is no S3 CDN configured.

I had issues setting up an S3 CDN, but it doesn’t cost me an arm and a leg so I am fine with just using S3 directly. But what I’d love to see is for this notification to go away because I am fully aware of the consequences.

2 Likes

Hi.

I just want give update. We will able setup backup using GCS now. I post on other thread too. I hope it will help other people who painful search this solution.

How to do that?
Enable default config for backup(or you can setting form admin dashboard).

DISCOURSE_S3_BACKUP_BUCKET: falcoland-files/backup
DISCOURSE_BACKUP_LOCATION: s3

Then set bucket permission as Storage Legacy Object Owner:

  1. Go to your project in Google Cloud Console
  2. Select Storage
  3. Select your bucket
  4. Go to the permissions tab
  5. Add new permission, fill your service account email with your account. for roles, select Storage Legacy Object Owner
  6. Save and done.

Sorry for double post, I just want share this good update.
Thank you

5 Likes

It’d be great if you could add Wasabi as well…

1 Like

I used wasabi for backups for a while. As far as configuration goes, it “just worked”, so you could give it a try if you want.

But with some frequency the backups failed silently and backups stayed on the local machine, filling up the disk. I looked at wasabi and Discourse errors and never found an explanation that would make it possible for either end to “fix” anything.

I don’t recommend it, as I’m not sure that it’s “great.”

2 Likes

Thanks!
I will check how it goes this time.
The default backup frequency is 7 days between backups and up to 5 backups.
I’ll share how it goes.

1 Like

I was doing daily backups; I don’t remember how many I was trying to keep, but the local hard drive had room for only a couple.

2 Likes

Which storage would you personally recommend?
I don’t think AWS and Azure are affordable for personal projects.
Not sure about Azure, but AWS looks confusing and unpredictable…

1 Like

I hoped that wasabi would be good for backups at least. Backblaze s3 is affordable. I don’t think I’ve used it for uploads, but it works for backups (I think). I think the only issue with backblaze is that (when I last tested it) you had to use a global key (so I couldn’t use it for clients who could see the key). I think someone recently posted a fix for that (something about “legacy” something, somewhere). For a personal project, that’s what I’d try next, I think. (And if you’re on Digital Ocean, spaces is OK, I think.)

3 Likes

I’m on DO, but cost-wise Blackblaze/Wasabi is cheaper.
What did you use for uploads?

1 Like

Sorry, I can’t get.
Why do we have to specify settings in app.yml, as we can enter this information within from Discourse->Settings ?

1 Like

Because of the way that it handles building assets when the container is built, I think. It is quite confusing that the behavior is different when the settings are in the environment variables and the database, but that’s the way it works. It’s also a better way too handle them, as it means you can build and restore a new site from the command line.

4 Likes

Where should I set this? I’m not finding it in Discourse admin settings?

1 Like