Migrate_to_s3 for Digital Ocean Spaces woes

pfaffman · February 20, 2019, 3:54pm

Continuing the discussion from `rake uploads:migrate_to_s3` needs some love:

I still can’t get migrate_to_s3 to work with Digital Ocean Spaces.

root@shadrach-rbx8888:/var/www/discourse# rails c
[1] pry(main)> GlobalSetting.use_s3
=> true
[2] pry(main)> GlobalSetting.use_s3?
=> false
[3] pry(main)> 
root@shadrach-rbx8888:/var/www/discourse# rake uploads:migrate_to_s3
Migrating uploads to S3 for 'default'...
Please provide the following environment variables
  - DISCOURSE_S3_BUCKET
  - DISCOURSE_S3_REGION
  - DISCOURSE_S3_ACCESS_KEY_ID
  - DISCOURSE_S3_SECRET_ACCESS_KEY
root@shadrach-rbx8888:/var/www/discourse# echo $DISCOURSE_S3_BUCKET
lc-rbx
root@shadrach-rbx8888:/var/www/discourse# echo $DISCOURSE_S3_REGION
none
root@shadrach-rbx8888:/var/www/discourse# echo $DISCOURSE_S3_ACCESS_KEY_ID
5-some-other-lettersH
root@shadrach-rbx8888:/var/www/discourse# echo $DISCOURSE_S3_SECRET_ACCESS_KEY
E-some-other-letters-M
root@shadrach-rbx8888:/var/www/discourse#

But here’s where I’m really confused. The latest uploads.rb cranks up Aws::S3 with just a few environment variables (and not those required for Digital Ocean spaces).

github.com

discourse/discourse/blob/main/lib/tasks/uploads.rake#L249-L252


      
          
          # list all missing uploads and optimized images
          task "uploads:missing_files" => :environment do
            if ENV["RAILS_DB"]

But in blame, at line 248, I see

 s3 = Aws::S3::Client.new(S3Helper.s3_options(GlobalSetting))

I don’t understand why a commit from 2 months ago isn’t showing up in tests-passed. I’m at 04a63cfaaa445f66c2d3d5309191abe9d36c2371 and my uploads.rb still has the environment variables rather than the GlobalSetting that I see in blame.

I also can’t quite tell how calling S3Helper.s3_options(GlobalSetting) can work when s3_upload_bucket isn’t a global setting. There might be some magic in s3_helper.rb that i don’t understand that somehow finds the s3_upload bucket, as I see new uploads are going to S3/spaces.

justin · February 20, 2019, 4:12pm

I ended up getting this working myself by adding those environment variables to app.yml and rebuilding. Feels like a workaround and probably not the best way forward long-term.

tgxworld · February 21, 2019, 1:11am

@rishabh and @zogstrip worked on this task recently so they will probably be able to advise better.

rishabh · February 21, 2019, 10:41am

I’ve fixed this in:

For the GlobalSetting issue, can you try running the command like this to test if your config is alright?

DISCOURSE_S3_ACCESS_KEY_ID="" \
DISCOURSE_S3_SECRET_ACCESS_KEY="" \
DISCOURSE_S3_REGION="us-east-1" \
DISCOURSE_S3_BUCKET="uploadbucket" \
DISCOURSE_S3_CDN_URL="https://xyz.com" \
DRY_RUN=1 rake uploads:migrate_to_s3

Please make sure that you’ve set the correct DigitalOcean endpoint in SiteSetting.s3_endpoint and that you’ve set DRY_RUN=0 to make the actual migration happen.

hosna · February 21, 2019, 11:36am

Any news for minio?

pfaffman · February 27, 2019, 8:00pm

I still don’t see how GlobalSetting.use_s3? is ever true. I have GlobalSetting.use_s3, but not with a ? and I don’t see how seeting those ENV vars sets GlobalSetting.use_s3?. Maybe that’s supposed to happen somewhere that I don’t know about or understand, but I removed the ? in order to be able to run the script at all.

So I removed the ? and the dry run looked promising and I ran it for real. It took a couple days (70GB files and a badly configured network adapter) and then it failed like:

Updating the URLs in the database...                                                       [14/9360]
rake aborted!
PG::StringDataRightTruncation: ERROR:  value too long for type character varying(500)
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/rack-mini-profiler-1.0.2/lib/patches/db/pg.rb:110:i
n `async_exec'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/rack-mini-profiler-1.0.2/lib/patches/db/pg.rb:110:i
n `async_exec'
(eval):24:in `async_exec'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/mini_sql-0.2.1/lib/mini_sql/postgres/connection.rb:
118:in `run'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/mini_sql-0.2.1/lib/mini_sql/postgres/connection.rb:
90:in `exec'
/var/www/discourse/lib/db_helper.rb:31:in `block in remap'
/var/www/discourse/lib/db_helper.rb:20:in `each'
/var/www/discourse/lib/db_helper.rb:20:in `remap'
/var/www/discourse/lib/tasks/uploads.rake:364:in `migrate_to_s3'
/var/www/discourse/lib/tasks/uploads.rake:210:in `block in migrate_to_s3_all_sites'
...

It’s probably coincidence that it’s the same error as Trouble with `discourse remap` remapping `topic_links url` (it is a different site).

gerhard · February 27, 2019, 8:50pm

It’s right here:

github.com

discourse/discourse/blob/6b006c383bf468b9c888466de1eb34c56076fc73/app/models/global_setting.rb#L78-L86


      
          def self.use_s3?
            (@use_s3 ||=
              begin
                s3_bucket &&
                s3_region && (
                  s3_use_iam_profile || (s3_access_key_id && s3_secret_access_key)
                ) ? :true : :false
              end) == :true
          end

pfaffman · February 27, 2019, 9:07pm

Ah! I figured it must be in some place I didn’t understand. Thanks for that.

Now that I understand, I think that it’s because I don’t have GlobalSetting.s3_region, even if I do pass the ENV variable.

Any guess on what table is causing the value too long?

gerhard · February 27, 2019, 9:31pm

rishabh:

DISCOURSE_S3_ACCESS_KEY_ID="" \
DISCOURSE_S3_SECRET_ACCESS_KEY="" \
DISCOURSE_S3_REGION="us-east-1" \
DISCOURSE_S3_BUCKET="uploadbucket" \
DISCOURSE_S3_CDN_URL="https://xyz.com" \
DRY_RUN=1 rake uploads:migrate_to_s3

That works for me. You are setting a value for for the access keys, right?

What makes you think that this is the case? You can replace DRY_RUN=1 rake uploads:migrate_to_s3 with rails c in the above command and inspect the values of the settings if you want to see what’s happening…

It must be the url column in the topic_links table. It’s the only column with a restriction of 500 chars.

pfaffman · February 27, 2019, 9:43pm

Thanks for that. I’ve got 30 on this site (different from the site discussed here) with more than 490 characters. I’m not clear why the remap has this problem. Oh! Is it because this table is getting populated when the posts are rebaked? If that’s it, isn’t it a bug if sticking a URL with >500 characters in a post causes this error? Should there be some error checking that catches it?

Tagging @falco since he was involved in the other topic about this.

pfaffman · February 27, 2019, 10:05pm

I think the problem was that the S3_REGION wasn’t set. I just added it to the ENV in app.yml and tried again and now.

DRY_RUN=1 rake uploads:migrate_to_s3

works as expected.

Now I’d like to figure out what to do about the topic_links table, as even now that the files are uploaded, it doesn’t look like it’s any (much?) faster and there are >200K files.

pfaffman · February 28, 2019, 3:01pm

I’ll see if I can track down the topic link thing. I guess it’s in app/models/topic_link?

The other thing that I don’t understand is how the assets get pushed to S3. I have the S3 cdn defined and now all of the css files are 403 because it’s looking for them in the S3 cnd bucket. I tried bootstrapping with the S3 cdn bucket set, but that didn’t fix it. (and worse, my monitor only looks for ‘discourse’ in the output and didn’t notice they all the assets were broken).

EDIT: Perhaps the URL should be truncated here:

https://github.com/discourse/discourse/blob/master/app/models/topic_link.rb#L120

I’ve been working on this (off and on) for weeks and would love to get this sorted.

gerhard · February 28, 2019, 3:58pm

Take a very good look at those 30 links. Why are they so long? Do the links even make sense? And why are they affected by the remap? Because, no, this isn’t caused by a rebake but by the remapping of upload URLs.

pfaffman · February 28, 2019, 4:14pm

No. They are a bunch of stupid links, created by children. The other community where there are 3 such links, one was a create-a-topic link and the other two were something else stupid.

Maybe what to do is just skip URIs that are longer than 500 chars, assuming that they are nonsense?

EDIT: That seems to work, but now I’m getting a non-uniq violation. I’m guessing it’s unrelated?

pfaffman · February 28, 2019, 5:17pm

This is my showstopper on this S3 migration right now. (I think I’ll just truncate the topic_links urls in the DB as suggested here).

How do the assets get pushed to S3?

pfaffman · March 1, 2019, 1:23am

Well, now it’s even stranger:

ActionController::RoutingError (No route matches [GET] "/t/why-was-the-forum-down-for-so-long/254643/https:/lc-rbx.ams3.cdn.digitaloceanspaces.com/uploads/original/3X/6/1/6191c280e518330ba11197299834f074fe49eb33.gif")

See how something is contatenating an image to the route?

I’m also unable to replace any of the site logos.

I can upload to S3 when creating a new post; I can turn off S3 uploads and upload to local storage.

I’m increasingly stumped.

tgxworld · March 11, 2019, 6:32am

This was a sneaky one…

https://github.com/discourse/discourse/commit/d82876896eaf2ccd150ea9202003ed11742d2136

If you’re on a docker based install, GlobalSetting uses the FileProvider which reads from /var/www/discourse/config/discourse.conf. That file is only populated from the ENV once when the container is first started. As a result, running DISCOURSE_S3_REGION rails console will not have any effect on GlobalSetting.s3_region since it has not been added to config/discourse.conf.

pfaffman · March 11, 2019, 10:24am

Thanks, @tgxworld. Does this fix the concatenated path I referenced above?

tgxworld · March 12, 2019, 2:37am

Do you have more information about the error here?

How are you running into this route?

pfaffman · March 13, 2019, 4:57pm

Problem with site logos had to do with the image processing. I changed the png to a gif and the error went away. I couldn’t figure out what was up with the image that caused the image processing to fail.

Not sure if it’s related, but I think my database is hosed. I got this trying to do a restore:

[2019-03-13 15:44:16] ERROR:  invalid input syntax for integer: "{"toplose 6 forums are closing","origi lash meugina93fep93fep9titZ quipaWin _tit,"or aM3:46:1130"
[2019-03-13 15:44:16] CONTEXT:  COPY notifications, line 135659, column post_action_id: "{"toplose 6 forums are closing","origi lash meugina93fep93fep9titZ quipaWin _tit,"or aM3:46:1130"
[2019-03-13 15:44:16] EXCEPTION: psql failed

Topic		Replies	Views
Rake uploads:migrate_from_s3 fails Installation s3	31	2896	April 24, 2021
Uploads migration to S3 fails Support	26	3077	May 7, 2021
S3 uploads migrations to DigitalOcean S3 Spaces fails Support	1	534	May 20, 2024
1 of 16551 uploads are not migrated to S3. S3 migration failed for db 'default' Installation s3 , cdn	22	1715	August 10, 2023
Rake uploads:migrate_to_s3 error Installation s3	8	788	January 6, 2020

Migrate_to_s3 for Digital Ocean Spaces woes

Related topics