Extend S3 configuration for other S3 API compatible services


(Rishabh Nambiar) #1

Hi everyone :grinning:
This feature was requested by a lot of community members in this post.
I’m planning to add support for DigitalOcean Spaces because it’s fully S3 compatible but if the community prefers another cloud option, suggestions are welcome!
Here’s a link to the DigitalOcean API.

I’m excited to work on this feature but I’m new to Ruby and the code base so I would need a bit of guidance:

  • It would be super helpful if someone could share some documentation or resources about modifying site settings and integrations as this would help me understand how Discourse integrations work.

  • What’s an ideal development environment for testing the existing AWS features? VPS with a production Discourse instance? I wanted to know because I might have to deal with nginx etc.

To see if I’ve understood the requirements properly before I start working:

The goal here is to extend s3 configuration and use the existing s3 code for using another compatible backup service. The official aws-sdk-s3 gem supports custom HTTP endpoints as shown here, so for example, I would have to select “custom” in the s3 location and then add a custom_s3_url.
Is that correct?


(Bhanu Sharma) #2

Some insight here:

For development, start with a local development sandbox.

Discourse can be a good starting point as your first ruby app as well.


(Rishabh Nambiar) #3

Thanks, that post is actually my primary source of information! :grinning:

Perfect, I’ve already got a local sandbox up and running so I’ll get to work.
Thank you!


(Rishabh Nambiar) #4

@sam I did the research and yes, this is well supported by aws-sdk-ruby even though the documentation is lacking. This is a demonstration of aws-sdk-ruby being used to upload an archive to DigitalOcean Spaces. This works really well and I hope this helps and gives everyone an idea about how compatible these services are.

Steps:

  1. Create a DigitalOcean Space.
  2. Generate Access Keys for your DigitalOcean Account.
  3. Paste the Space name (bucket), key pair (access_key_id, secret_access_key), file name and path in the variables below. Run the code and the file should be visible in your Space.
require 'aws-sdk-s3'

name = 'test.tar.gz'
path = '/home/workspace/test.tar.gz'

# Before uploading, ensure that you've created a Space on DigitalOcean
bucket = 'discoursetest1'

# Configure an S3 Resource for use with Spaces
# Note: Generate Spaces Access Keys from cloud.digitalocean.com/settings/api/
s3 = Aws::S3::Resource.new(
    access_key_id: '',
    secret_access_key: '',
    endpoint: 'https://nyc3.digitaloceanspaces.com',
    region: 'nyc3',
    )

# Add a file to a Space
obj = s3.bucket(bucket).object(name)
obj.upload_file(path)

Note:
Read more about DigitalOcean’s AWS S3 compatiblity here and the AWS Ruby SDK here.


(Rishabh Nambiar) #5

@sam I’ve made this work by adding one optional field named spaces_endpoint to site settings.
This enables Spaces support for all existing S3 features like upload, delete etc.
Does this look fine to you?

spaces_endpoint:
    default: ''

Here’s an image of the new Settings/Files page:


(Kane York) #6

Can you put your code changes up as a PR on Github?

Also, that setting is being inserted as Aws::S3::Resource.new(... endpoint ...), right?
The setting should be named s3_endpoint, default of https://s3.amazonaws.com – as it’s not specific to DO Spaces.


(Oleg Bovykin) #7

There are should be some changes in nginx template. Its better to serve files via nginx, not directly from spaces. Also, its strongly recommended to add nginx caching to make minimum number of read requests to spaces API. Also, for local development (and for some production cases) you can use https://www.minio.io/ - its s3 api compatible storage software.


(Rishabh Nambiar) #8

You can view the PR here.

Yes, that’s much better. I’ve changed the name to s3_endpoint now.
I need to check the endpoint for DO to set DO-specific parameters and that’s what I’ve done in s3_endpoint(). I was not sure where to put the code for configuring different platforms. Thanks for the help!

Note: This PR is a Work in Progress.


(Rafael dos Santos Silva) #9

Shouldn’t everyone using S3-like services use a CDN? We even have a separate setting DISCOURSE_S3_CDN so people can have a s3-like only CDN.

Maybe after we merge the PR we can work out an optional template with nginx caching and another one using CDNs (Cloudflare should fit here, since it will be caching only statics).


(Oleg Bovykin) #10

Not everyone use CDN. There are many small installations that just dont need additional layer.
Also, its easy to add caching to nginx for any kind of s3-compatible storage.

And in case of minio, for small installations it can be used to directly serve images.


(Rafael dos Santos Silva) #11

I would like to avoid this if possible:

Can we just ignore the region if the opts[:endpoint] is different than the default? Objective is allowing to use any s3-compatible service without depending on Discourse to add code.


(Rishabh Nambiar) #12

I agree, that would be much cleaner.
From the Spaces API docs, I read that region and endpoint were the required parameters.

But I just tested it again without the region parameter and it still works. So yes, we can ignore the region if the given endpoint is different from the default.
I’ve updated the PR, Thanks! :+1:


(Kane York) #13

Maybe we should make s3_region a free-text field and rely on the admins to know the correct values?


(Rafael dos Santos Silva) #14

My first choice was having something like select-kit in the tags input, select+create. Not sure we have a widget like that handy @joffreyjaffeux ?


(Rishabh Nambiar) #15

I’d like to remind everyone that DigitalOcean Spaces features work by just using an endpoint like https://sgp1.digitaloceanspaces.com or https://nyc3.digitaloceanspaces.com.
There is no need for a separate region field, at least for DigitalOcean Spaces support.
I will check if other services can also work in this manner, with a single endpoint.


(Rafael dos Santos Silva) #16

If both Google and Minio work without the region then let’s ship it.


(Rishabh Nambiar) #17

All 3 platforms will not require an additional region field and the current PR will add support for DigitalOcean Spaces and Minio.

  1. Minio: endpoint format: server_ip:9000 or user domain
    Requires region but it works perfectly with the pre-existing s3_region options (drop-down menu) in site settings so it does not need an extra region field.

  2. Google Cloud Platform: endpoint format: https://storage.googleapis.com
    Does not require a region parameter but I haven’t got it to work yet because of an incorrect header issue that I’m investigating but it will work without an extra region field once I debug the issue.

  3. DigitalOcean Spaces: endpoint format: https://nyc3.digitaloceanspaces.com
    Works perfectly without an extra region field.


@Falco Before we ship it, the last problem is that Minio requires { force_path_style: true } in s3_options and AWS requires the setting to be false. This is because AWS and Minio use different addressing styles so if we force one, it breaks the other.
I know we wanted to avoid this but I don’t see how we can make both work without adding something specific like this:

if opts[:endpoint].includes? "minio"
  opts[:force_path_style] = true 

But even this won’t work because a lot of Minio users might use IP addresses to their servers instead of having “minio” in their endpoint. We have to think of a way to detect a Minio endpoint and we might need Minio specific code in Discourse if we want to support it.

We could support a lot of cloud options easily if we could have a free-text field for entering key:pair options for greater flexibility and avoiding issues like this. The region and force_path_style issues would be solved. @riking What do you feel about a field like this?


(Jay Pfaffman) #18

Could you just ask what the service was? Or have a force_path_style setting and in the description say it’s true for minio and false for AWS?


(Rafael dos Santos Silva) #19

@pfaffman solution, defaulting to false sounds good.


(Rishabh Nambiar) #20

Sure, I’ve added that and the PR now works great with S3, DigitalOcean Spaces and Minio :grinning:

@Falco, one question: When enable_s3_backups is disabled and enable_s3_uploads is enabled, do we expect a remote backup on clicking ‘Backup’ ?
On my installation, a remote backup only occurs when enable_s3_backups is enabled.