What are the right settings to use S3 bucket (with non-Amazon URL)?

So I have an Amazon S3 bucket set up to store my forum assets, and I put it on a custom domain and set up CloudFlare CDN to cache the content.

My custom domain is called something like http://forum-storage.com, which points to https://forum-storage.com.s3-us-east-1.amazonaws.com. The S3 bucket itself is named forum-storage.com.

This is all working correctly. If I add an image to the main folder of the bucket, I can retrieve it on my custom url, i.e. http://forum-storage.com/test.jpg returns the image, with the CloudFlare headers.

Three easy questions…

#1

Now I need to tell Discourse to use this new URL as my S3 bucket. What do I put in these 3 fields?

#2

I currently have images in my forum posts that are on another S3 bucket, and I also have images that are stored locally. (My image URLs are all over the place.)

Once I make the right changes (above), that means all NEW media added to my forum will go into the new bucket, but existing images will not be moved, and will continue to be accessed wherever they live now, correct?

#3

Now that this is working for all images going forward from this point, how can I tell Discourse to MOVE all the old images that aren’t in this new bucket, into the new bucket (and rebake posts as needed)?

The goal is to get everything into one bucket, this new one behind the CDN.

1 Like

:rotating_light: STOP NOW :rotating_light:

Create a new bucket that does not have a dot in its name. You will encounter endless suffering if you continue, due to how SSL wildcard certificates work.

Ref: Discourse does not allow dot «.» symbol in bucket name while Amazon S3 allows it
Ref: amazon web services - SSL Certificate Issue with bucket name containing dots ('.') while trying to use Virtual-Hosted url instead of Path Style for AWS S3 Bucket - Stack Overflow

8 Likes

Hmmm ok. Momentary snag! :wink: Thank you for the heads-up @riking.

AWS S3 has a feature where if you name the bucket the same as the domain, it easily lets you just add a CNAME record to the domain and it just all works.

So now I’m searching everywhere for info on how to attach a domain to a bucket that does not have the same name as the domain… hmm

@BryanV how did you point https://discourse-uploads.bokeh.org to your S3 bucket?

1 Like

Hmm, that’s nice, but you want to have a CDN e.g. CloudFront in front of the bucket, which will receive the cname, so that’s not a useful feature right now.

Not having a CDN will get you the same exorbitant transfer bill.

3 Likes

Right. I think we’re on the same page? To be more clear, as I understand it, there are 3 types of ways to connect Discourse and S3:

  1. Have Discourse use S3 cloud directly. Pro: super easy to set up. Con: quickly becomes expensive.

  2. Have Discourse use S3 cloud through a custom domain (like forum-storage.com) so I can use a CDN. Pros: very easy to set up with S3 if bucket name matches custom domain exactly (i.e. forum-storage.com.s3-amazon-aws.com). Cons: SSL is unhappy.

  3. Have Discourse use S3 cloud through a custom domain (again, so I can use a CDN), but set up the S3 bucket so it doesn’t have an extra dot in the name (i.e. forum-storage-com.s3-amazon-aws.com. Pros: SSL is happy. Cons: not so easy to set up with Amazon S3.

So… I was using #1 until I got the bill :slight_smile: then I learned #2 was an option and I set it up, then started this topic, and promptly learned #2 wasn’t really an option.

Now I’m working on #3. I think I have to use Amazon’s “Route 53” DNS service, or something. Still muddling through it. All the Googling I’ve done returns info on how to do #2, but nobody seems to have written clear instructions for #3.

Please correct me if I’m wrong or misunderstanding something…

There are tutorials for that, e.g. I just found this one for StackPath:

1 Like

@BryanV how did you point https://discourse-uploads.bokeh.org to your S3 bucket?

I added a CNAME record pointing to the <long id>.cloudfront.net URL for the Cloudfront CDN distribution to our DNS configuration for bokeh.org (which is on Cloudflare in our case, but that should not matter).

For reference, our S3 bucket does not have any dots in the name, but I also don’t recall any specific issues setting up the CDN because of that (or any issues creating the bucket, it just needs any unique name).

This is, without a doubt, the most frustrating thing ever. I can not, for the life of me, figure out how to string together Amazon S3 bucket (with no dot in the bucket name!), my custom domain, and CloudFlare to actually get them all to work. If I could put a dot in the bucket name, no problem. But for now, it’s just all too confusing. Ughhhhhh can anyone help me or point me toward a simple way to set up CloudFlare with an S3 bucket that is not the same name as the domain?

I tried the StackPath info above-- I think I did similar in CloudFlare but unclear. It didn’t work. I tried reading CloudFlare’s info about how to add the CDN to an Amazon bucket but of course, they want the bucket name to be the same as the domain name, which I am told is a Very Bad Idea and I can’t do it.

It really seems to come down to:

  • If the bucket name is the same as the domain name (with a period), Amazon S3 will handle it all for me, lovely, wonderful, except it screws with SSL so I shouldn’t do that.
  • If the bucket name does NOT match the domain name, everything gets massively more complicated and I can’t work it out.

Can anyone help/advise? And in the meantime I’m getting bills of $100+ every month for my S3 storage. This sucks so much. Can I pay someone $200 right now to just solve all this? Blargh.

Did you read this yet? I struggled with setting up S3 and Cloudflare as well but eventually figured it out. You can still use Cloudflare for its security benefit, but I’m pretty sure you need a separate CDN service as well. Cloudflare isn’t like a normal CDN, it works differently. You should move to a cheaper S3 service, Amazon is expensive.

Using Cloudflare to cache a S3 bucket means you need to manipulate the origin header in requests. That feature is available in the enterprise plan of Cloudflare, so using any other CDN may be easier.

2 Likes

Isn’t dot in bucket name irrelevant if he is going to cache the images via CDN anyway? The only thing that matters is having good cert that covers cloudflare served image.

I think he needs to be focused on cloudflare servers, DNS and cert to cover that stuff. I don’t think user/browser ever will know that the source of these images is S3 bucket. Cloudflare will cache/proxy the image itself, right?

Discourse will generate direct bucket URLs and use those for internal operations, such as “uploading a file”. It still matters.

@riking All Discourse seems to require is a bucket name, correct? Upload and management can be done on AWS URL’s with their certificates, if HTTPS is even required. So far is there any reason we are talking about security certs?

OP then can separately look at what he needs to do to allow his CDN or caching solution to fetch the images from S3. Secure or not, doesn’t matter unless OP or the CDN have requirements, right? Does Discourse care about his setup between S3 and CDN?

Finally, OP needs to make sure the images are served from the CDN with a valid cert. Does this have anything to do with Discourse other than OP supplying the base URL for the images will ultimately reside? Once his CDN or cache fetches the images from S3, then AWS, buckets, blah blah blah are out of the picture completely.

I get that there are issues about a dot in S3 bucket names if you intend to serve your images from there but the OP does NOT. So it just comes down to OP picking any bucket name that Discourse will accept, as long as it doesn’t interfere with whatever setup he has with CDN.

Even though it’s possible to avoid the bucket-in-domain URLs, it’s not actually avoided due to the way the aws s3 sdk is used and the difficulty? of configuring it.

Again, these operations bypass the CDN, the only way to fix them is in the Discourse source. They can be fixed, but currently aren’t. Many of the issues also don’t happen on the critical path, and only show up later. So until this changes, do not use bucket names with dots in them.

1 Like

So just to boil this down to drop dead simple… OP’s question was what to fill in to three settings to configure:

(1) BUCKET NAME – so it is being said that dots are not… recommended? not allowed? I suspect this may not be a problem for OP. (He just needs to separately figure out a way for his CDN to get images to cache and serve.) So we’re all on the same page here, right?

(2) S3 ENDPOINT – leave blank, nothing needed if he is using AWS, otherwise he can fill in for another provider?

(3) S3 CDN URL – is this simply the base URL that Discourse will prepend to the image path? If so, then that is simple and straightforward and the OP can configure his CDN and supply that base URL here.

I don’t see where SSL wildcard certificates factor into this. OP was told a dot in Discourse config is bad because it will break his cert. But if he is using a CDN or cache then bucket name could very well be irrelevant to the cert, right? If it will break Discourse in another way that is useful to know.

I’m not sure I’m following all this so closely but to zoom out slightly, maybe this simple set of requirements helps:

The goals:

  1. Not store my Discourse images on my Discourse server
  2. Have an S3 bucket to store images (must be S3 since that’s what Discourse supports)
  3. Not have expensive S3 charges
  4. A CDN is not required, but is a nice bonus since it may help cut down on (or be the only way to erase) expensive S3 charges and also provides better availablility worldwide, a backup in case the main server goes offline, etc, etc

Please correct me if any of this is wrong

Limitations/requirements:

  1. The external image store needs to support the S3 protocol (since that what Discourse works with) but, strictly speaking, doesn’t need to be Amazon’s S3.
  2. Discourse requires the S3 bucket name have no dots.
  3. The image source (S3 or CDN) must serve https:// since a browser will complain if the page is https but the images aren’t.

Please correct me if any of this is wrong

Solutions:

Previously, I was serving directly from Amazon S3. It worked great, except the charges from Amazon for DataTransfer-Out-Bytes is super expensive. This led to high monthly bills from Amazon! So I moved it back to the main server. Two possible fixes for this: park a CDN in front of Amazon S3 bucket so the CDN handles all the data transfer, and/or switch to a different S3 provider.

I tried to put the CloudFlare CDN on top of the Amazon S3 bucket but very quickly ran into a lot of issues that I couldn’t solve.

Another option?

Was just looking at this S3-compatible storage offering from Digital Ocean. Built-in CDN (not sure what exactly that means but it sounds promising), affording pricing. Would this work with Discourse?

For reference, I served ~300 GB from S3 in the last 30 days. Some of that is site backups, a lot of that is static images. It’s VERY difficult for me to sort out how to measure these things in Amazon… their billing reporting interface-- like everything else about Amazon AWS-- is really confusing to admin.

I believe that the simplest solution is AWS and KeyCDN, following the guidelines at Using Object Storage for Uploads (S3 & Clones). If your users are not in South America, KeyCDN is pretty affordable and easy to configure.

A potentially less expensive solution might be How to Setup BackBlaze S3 with BunnyCDN. I have been pleased with backblaze in my initial testing for backups, but haven’t yet tried it for uploads.

1 Like

We got wildly distracted about a dot in a bucket name and browser certificates but I think all of that discussion is completely irrelevant. ANY CDN is going to allow HTTPS configuration so there is ZERO ISSUE with the “wildcard cert” problem and a dot or no dot in the bucket name WITH REGARDS TO END USER BROWSER CERT. Because again, ANY CDN is certainly going to allow this.

So OP can simply…
(1) pick an S3 and CDN compatible storage solution and configure endpoint and bucket name in Discourse settings.
(2) Configure CDN to fetch images from S3. Secure or insecure. I don’t think OP cares as long as CDN serves to user via HTTPS.

Someone please correct me if I am missing something here. I think the end user browser cert + dot in bucket name problem is ONLY an issue if you are going to serve the images FROM the bucket. Irrelevant to serving from CDN.

p.s. this topic, that @pfaffman helpfully linked to above, points out that Digital Oceans’ S3 product (“Spaces”) has an “awfully broken” CDN.

And I see that for other S3 providers, there are various settings that have to be tweaked.

What this is telling me:

  • Settings will vary from provider to provider, even if they all claim to all follow the “S3” protocol
2 Likes

Still

:warning: Do NOT put a dot (period) in your bucket name

I mean, unless you enjoy suffering.