Configure an S3 compatible object storage provider for uploads

Had to park this for now, as it looked like was going to work but then there is something odd going on with R2 in terms of content encoding with the assets either on uploading and not setting the header or something else. It’ll choke on a ‘Invalid or unexpected token’ given the gz asset of something like browser-detect-7af298cd000a967d2bdc01b04807eda2924a388584ea38ad84919b726283c2ed.gz.js. The rake s3:upload_assets seems to be working but the files aren’t being read correctly on the browser side.

I don’t really get why with AWS S3 it is fine using the local server URL for assets (they don’t exist on our existing S3 bucket for uploads) but for R2 use it wants to use DISCOURSE_S3_CDN_URL for assets only. If I could force the assets to be from the server URL this would probably all work.

EDIT: Chatting on the CF, this seems to be the issue, and as of today why R2 can’t be used with Discourse without some changes. I could script something in the post hook step to remove the gz assets but I feel I’m already ‘off the path’ far enough for one day:

Files that you gzip are not currently handled correctly by R2. You have to upload uncompressed files. Cloudflare has transparent compression, they pick identity, gzip, or Brotli based on what the client can handle. This is a difference from S3.

2 Likes

Nice work! And that’s a clear message from cloudflare about why it won’t work. Thanks very much. I’ll copy that into the OP soon.

2 Likes

Thanks again! I updated the OP:

3 Likes

Thank you for putting together this guide! I have had some success using Minio.

For anyone else who is trying to set it up locally with Docker Compose, you can tell Docker to add a hostname alias so that it works as a subdomain, like this:

  minio:
    image: minio/minio
    command: server --console-address :9001 /data
    ports:
      - "9000:9000"
      - "9001:9001"
    volumes:
      - ./data/minio:/data
    environment:
      MINIO_DOMAIN: minio.mydomain.com
    networks:
      default:
        aliases:
          - assets.minio.mydomain.com

In this case, you would set DISCOURSE_S3_ENDPOINT=http://minio.mydomain.com:9000, DISCOURSE_S3_CDN_URL=//assets.minio.mydomain.com:9000, and set your local /etc/hosts/ file to point the subdomain to localhost.

This works mostly fine, but I did notice that Discourse is unable to download files from an address that doesn’t have port 80 or 443, so uploading an image will work, but then when it attempts to download it to resize it, it will fail.

I was thinking it might be good to mention that in the Minio section or in summary, that the DISCOURSE_S3_CDN_URL must be on port 80 or 443.

4 Likes

Hey @Falco - Is this referring to the way the Content-Encoding: gzip header works with their Spaces CDN? That sounds similar to Cloudflare R2, in that the asset locations is made to be the same as the uploads CDN, so the gzip breaks? Here’s what happens with R2 today.

It might be worth considering a toggle for that behavior, i.e. serve assets from origin rather than always DISCOURSE_S3_CDN_URL? I’ll happily go look to see how to do this, if it would be considered as a potential config change.

3 Likes

That’s what should happen if you omit configuring DISCOURSE_S3_CDN_URL but since it’s a weird corner case, and a potential expensive mistake, it’s not a common configuration.

3 Likes

Yep, I can understand that. A new GlobalSetting bool S3_ORIGIN_ASSETS (or S3_BROKEN_PROXY_FUDGE :slight_smile:) entry around about here, sort of like for how the test scripts aren’t compressed would allow Digital Ocean Spaces and Cloudflare R2 storage and CDN to work with Discourse out of the box though, which is a nice feature add for not much effort? Maybe for future consideration anyway. :heart_eyes_cat:

4 Likes

Oh, I saw on the 3.0.beta release notes there’s something added. I’ll give it a go, unless I misunderstand what it’s for? It might allow Cloudflare R2 and Digital Ocean Spaces to be used with their CDNs doing that weird stuff with gzip.

1 Like

No, that’s unrelated.

3 Likes

The setting allowed me to specify the local site as the origin, to get around the need for the js assets to be on the S3 site (in this case Cloudflare or Digital Ocean Spaces with CDN enabled). Thanks to @david for the change, even if that wasn’t the intention.

4 Likes

Do you enter the site url for the asset cdn? Clever!

1 Like

Hi folks, anybody knows if that could be related with Discourse?

That’s the XML of the files that we tried to upload to our previously ‘working with Discourse’ S3 storage:

<Error>
<Code>InvalidArgument</Code>
<Message>
Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.
</Message>
<ArgumentName>Authorization</ArgumentName>
<ArgumentValue>null</ArgumentValue>
<RequestId>ID</RequestId>
<HostId>
ID
</HostId>
</Error>
1 Like

Are you using AWS? Something else?

Is that bucket configured with server side encryption?

It could be that a library got updated and is behaving differently.

2 Likes

Thanks, I double-checked and it seems to work with auto configuration but not managing my own keys from S3 management.

Do you know if can be possible within Discourse?

1 Like

3 posts were split to a new topic: Why run UpdatePostUploadsSecureStatus even when secure uploads is disabled?

this seems to have been fixed recently.
In the 2023-3-16 changelog it lists bug fix for gzip files handling.

We are running our discourse forum at discourse.aosus.org with R2 right now(haven’t run migrate_to_s3 yet), and it seems to be OK!, no noticeable issues so far.

  DISCOURSE_USE_S3: true
  DISCOURSE_S3_REGION: "us-east-1" #alias to auto
  #DISCOURSE_S3_INSTALL_CORS_RULE: true #it should be supported
  DISCOURSE_S3_ENDPOINT: S3_API_URL
  DISCOURSE_S3_ACCESS_KEY_ID: xxx
  DISCOURSE_S3_SECRET_ACCESS_KEY: xxxx
  DISCOURSE_S3_CDN_URL: your cdn url
  DISCOURSE_S3_BUCKET: BUCKET_NAME

is there a way to specify a separate hosts for backups?, it would be great if its possible to leave R2 just for CDN stuff.

2 Likes

There is not. It seems unlikely to me that this will change.

1 Like

23 posts were split to a new topic: Troubles configuring Object Storage

It’s wired that the settings in ENV do not reflect in admin UI. Does overriding happen? Will new settings of S3 in admin UI override those in environment?

1 Like

Yes. Env variables override values on the database and are hidden from the UX.

4 Likes