Extend S3 configuration for other S3 API compatible services

Hi everyone :grinning:
This feature was requested by a lot of community members in this post.
I’m planning to add support for DigitalOcean Spaces because it’s fully S3 compatible but if the community prefers another cloud option, suggestions are welcome!
Here’s a link to the DigitalOcean API.

I’m excited to work on this feature but I’m new to Ruby and the code base so I would need a bit of guidance:

  • It would be super helpful if someone could share some documentation or resources about modifying site settings and integrations as this would help me understand how Discourse integrations work.

  • What’s an ideal development environment for testing the existing AWS features? VPS with a production Discourse instance? I wanted to know because I might have to deal with nginx etc.

To see if I’ve understood the requirements properly before I start working:

The goal here is to extend s3 configuration and use the existing s3 code for using another compatible backup service. The official aws-sdk-s3 gem supports custom HTTP endpoints as shown here, so for example, I would have to select “custom” in the s3 location and then add a custom_s3_url.
Is that correct?

5 Likes

Some insight here:

For development, start with a local development sandbox.

Discourse can be a good starting point as your first ruby app as well.

4 Likes

Thanks, that post is actually my primary source of information! :grinning:

Perfect, I’ve already got a local sandbox up and running so I’ll get to work.
Thank you!

5 Likes

@sam I did the research and yes, this is well supported by aws-sdk-ruby even though the documentation is lacking. This is a demonstration of aws-sdk-ruby being used to upload an archive to DigitalOcean Spaces. This works really well and I hope this helps and gives everyone an idea about how compatible these services are.

Steps:

  1. Create a DigitalOcean Space.
  2. Generate Access Keys for your DigitalOcean Account.
  3. Paste the Space name (bucket), key pair (access_key_id, secret_access_key), file name and path in the variables below. Run the code and the file should be visible in your Space.
require 'aws-sdk-s3'

name = 'test.tar.gz'
path = '/home/workspace/test.tar.gz'

# Before uploading, ensure that you've created a Space on DigitalOcean
bucket = 'discoursetest1'

# Configure an S3 Resource for use with Spaces
# Note: Generate Spaces Access Keys from cloud.digitalocean.com/settings/api/
s3 = Aws::S3::Resource.new(
    access_key_id: '',
    secret_access_key: '',
    endpoint: 'https://nyc3.digitaloceanspaces.com',
    region: 'nyc3',
    )

# Add a file to a Space
obj = s3.bucket(bucket).object(name)
obj.upload_file(path)

Note:
Read more about DigitalOcean’s AWS S3 compatiblity here and the AWS Ruby SDK here.

3 Likes

@sam I’ve made this work by adding one optional field named spaces_endpoint to site settings.
This enables Spaces support for all existing S3 features like upload, delete etc.
Does this look fine to you?

spaces_endpoint:
    default: ''

Here’s an image of the new Settings/Files page:

3 Likes

Can you put your code changes up as a PR on Github?

Also, that setting is being inserted as Aws::S3::Resource.new(... endpoint ...), right?
The setting should be named s3_endpoint, default of https://s3.amazonaws.com – as it’s not specific to DO Spaces.

6 Likes

There are should be some changes in nginx template. Its better to serve files via nginx, not directly from spaces. Also, its strongly recommended to add nginx caching to make minimum number of read requests to spaces API. Also, for local development (and for some production cases) you can use https://www.minio.io/ - its s3 api compatible storage software.

You can view the PR here.

Yes, that’s much better. I’ve changed the name to s3_endpoint now.
I need to check the endpoint for DO to set DO-specific parameters and that’s what I’ve done in s3_endpoint(). I was not sure where to put the code for configuring different platforms. Thanks for the help!

Note: This PR is a Work in Progress.

1 Like

Shouldn’t everyone using S3-like services use a CDN? We even have a separate setting DISCOURSE_S3_CDN so people can have a s3-like only CDN.

Maybe after we merge the PR we can work out an optional template with nginx caching and another one using CDNs (Cloudflare should fit here, since it will be caching only statics).

5 Likes

Not everyone use CDN. There are many small installations that just dont need additional layer.
Also, its easy to add caching to nginx for any kind of s3-compatible storage.

And in case of minio, for small installations it can be used to directly serve images.

I would like to avoid this if possible:

https://github.com/rishabhnambiar/discourse/blob/a9bdbf962946cdbb9516b21b92434cd68c90831a/lib/s3_helper.rb#L166-L168

Can we just ignore the region if the opts[:endpoint] is different than the default? Objective is allowing to use any s3-compatible service without depending on Discourse to add code.

2 Likes

I agree, that would be much cleaner.
From the Spaces API docs, I read that region and endpoint were the required parameters.

But I just tested it again without the region parameter and it still works. So yes, we can ignore the region if the given endpoint is different from the default.
I’ve updated the PR, Thanks! :+1:

3 Likes

Maybe we should make s3_region a free-text field and rely on the admins to know the correct values?

4 Likes

My first choice was having something like select-kit in the tags input, select+create. Not sure we have a widget like that handy @joffreyjaffeux ?

I’d like to remind everyone that DigitalOcean Spaces features work by just using an endpoint like https://sgp1.digitaloceanspaces.com or https://nyc3.digitaloceanspaces.com.
There is no need for a separate region field, at least for DigitalOcean Spaces support.
I will check if other services can also work in this manner, with a single endpoint.

If both Google and Minio work without the region then let’s ship it.

6 Likes

All 3 platforms will not require an additional region field and the current PR will add support for DigitalOcean Spaces and Minio.

  1. Minio: endpoint format: server_ip:9000 or user domain
    Requires region but it works perfectly with the pre-existing s3_region options (drop-down menu) in site settings so it does not need an extra region field.

  2. Google Cloud Platform: endpoint format: https://storage.googleapis.com
    Does not require a region parameter but I haven’t got it to work yet because of an incorrect header issue that I’m investigating but it will work without an extra region field once I debug the issue.

  3. DigitalOcean Spaces: endpoint format: https://nyc3.digitaloceanspaces.com
    Works perfectly without an extra region field.


@Falco Before we ship it, the last problem is that Minio requires { force_path_style: true } in s3_options and AWS requires the setting to be false. This is because AWS and Minio use different addressing styles so if we force one, it breaks the other.
I know we wanted to avoid this but I don’t see how we can make both work without adding something specific like this:

if opts[:endpoint].includes? "minio"
  opts[:force_path_style] = true 

But even this won’t work because a lot of Minio users might use IP addresses to their servers instead of having “minio” in their endpoint. We have to think of a way to detect a Minio endpoint and we might need Minio specific code in Discourse if we want to support it.

We could support a lot of cloud options easily if we could have a free-text field for entering key:pair options for greater flexibility and avoiding issues like this. The region and force_path_style issues would be solved. @riking What do you feel about a field like this?

5 Likes

Could you just ask what the service was? Or have a force_path_style setting and in the description say it’s true for minio and false for AWS?

1 Like

@pfaffman solution, defaulting to false sounds good.

2 Likes

Sure, I’ve added that and the PR now works great with S3, DigitalOcean Spaces and Minio :grinning:

@Falco, one question: When enable_s3_backups is disabled and enable_s3_uploads is enabled, do we expect a remote backup on clicking ‘Backup’ ?
On my installation, a remote backup only occurs when enable_s3_backups is enabled.