Migrate_to_s3 for Digital Ocean Spaces woes

Ah! I figured it must be in some place I didn’t understand. Thanks for that.

Now that I understand, I think that it’s because I don’t have GlobalSetting.s3_region, even if I do pass the ENV variable.

Any guess on what table is causing the value too long?

That works for me. You are setting a value for for the access keys, right?

What makes you think that this is the case? You can replace DRY_RUN=1 rake uploads:migrate_to_s3 with rails c in the above command and inspect the values of the settings if you want to see what’s happening…

It must be the url column in the topic_links table. It’s the only column with a restriction of 500 chars.

3 Likes

Thanks for that. I’ve got 30 on this site (different from the site discussed here) with more than 490 characters. I’m not clear why the remap has this problem. Oh! Is it because this table is getting populated when the posts are rebaked? If that’s it, isn’t it a bug if sticking a URL with >500 characters in a post causes this error? Should there be some error checking that catches it?


Tagging @falco since he was involved in the other topic about this.

I think the problem was that the S3_REGION wasn’t set. I just added it to the ENV in app.yml and tried again and now.

DRY_RUN=1 rake uploads:migrate_to_s3

works as expected.

Now I’d like to figure out what to do about the topic_links table, as even now that the files are uploaded, it doesn’t look like it’s any (much?) faster and there are >200K files.

I’ll see if I can track down the topic link thing. I guess it’s in app/models/topic_link?

The other thing that I don’t understand is how the assets get pushed to S3. I have the S3 cdn defined and now all of the css files are 403 because it’s looking for them in the S3 cnd bucket. I tried bootstrapping with the S3 cdn bucket set, but that didn’t fix it. (and worse, my monitor only looks for ‘discourse’ in the output and didn’t notice they all the assets were broken).

EDIT: Perhaps the URL should be truncated here:

I’ve been working on this (off and on) for weeks and would love to get this sorted.

Take a very good look at those 30 links. Why are they so long? Do the links even make sense? And why are they affected by the remap? Because, no, this isn’t caused by a rebake but by the remapping of upload URLs.

2 Likes

No. They are a bunch of stupid links, created by children. The other community where there are 3 such links, one was a create-a-topic link and the other two were something else stupid.

Maybe what to do is just skip URIs that are longer than 500 chars, assuming that they are nonsense?

EDIT: That seems to work, but now I’m getting a non-uniq violation. I’m guessing it’s unrelated?

This is my showstopper on this S3 migration right now. (I think I’ll just truncate the topic_links urls in the DB as suggested here).

How do the assets get pushed to S3?

Well, now it’s even stranger:

ActionController::RoutingError (No route matches [GET] "/t/why-was-the-forum-down-for-so-long/254643/https:/lc-rbx.ams3.cdn.digitaloceanspaces.com/uploads/original/3X/6/1/6191c280e518330ba11197299834f074fe49eb33.gif")

See how something is contatenating an image to the route?

I’m also unable to replace any of the site logos.

I can upload to S3 when creating a new post; I can turn off S3 uploads and upload to local storage.

I’m increasingly stumped.

This was a sneaky one…

If you’re on a docker based install, GlobalSetting uses the FileProvider which reads from /var/www/discourse/config/discourse.conf. That file is only populated from the ENV once when the container is first started. As a result, running DISCOURSE_S3_REGION rails console will not have any effect on GlobalSetting.s3_region since it has not been added to config/discourse.conf.

8 Likes

Thanks, @tgxworld. Does this fix the concatenated path I referenced above?

Do you have more information about the error here?

How are you running into this route?

2 Likes

Problem with site logos had to do with the image processing. I changed the png to a gif and the error went away. I couldn’t figure out what was up with the image that caused the image processing to fail.

Not sure if it’s related, but I think my database is hosed. I got this trying to do a restore:

[2019-03-13 15:44:16] ERROR:  invalid input syntax for integer: "{"toplose 6 forums are closing","origi lash meugina93fep93fep9titZ quipaWin _tit,"or aM3:46:1130"
[2019-03-13 15:44:16] CONTEXT:  COPY notifications, line 135659, column post_action_id: "{"toplose 6 forums are closing","origi lash meugina93fep93fep9titZ quipaWin _tit,"or aM3:46:1130"
[2019-03-13 15:44:16] EXCEPTION: psql failed
1 Like

Wait. So now I’ve restored the site to a staging server and ran:

rake uploads:migrate_to_s3                                                              
Migrating uploads to S3 for 'default'...
Some uploads were not migrated to the new scheme. Please run these commands in the rails console

SiteSetting.migrate_to_new_scheme = true
Jobs::MigrateUploadScheme.new.execute(nil)

So does that mean that it’s going to just rebake all the posts and upload them in the background?

OOH! It looks like that’s what’s happening. Maybe I’m finally going to be able to make this transition!

EDIT: This is further confused because I previously had a CDN configured. I guess I need to rebake the posts without the CDN and then do the migrate_to_s3?

Also, I think that there’s a bug that if you have a CDN configured and then enable the s3 CDN, Discourse thinks that assets are on the S3 CDN. No. That’s not it. It looks like having DISCOURSE_S3_CDN_URL defined makes Discourse point to the CDN URL for assets, but the assets aren’t there. I thought that a rake assets:precompile might get it, but no joy. This might need to be another topic

Well, no, it seems like nothing is happening. Nothing happens when running MigrateUploadScheme and there are no sidekiq jobs queued.

root@shadrack-rbx:/var/www/discourse# rake uploads:migrate_to_s3
Migrating uploads to S3 for 'default'...
Some uploads were not migrated to the new scheme. Please run these commands in the rails console

SiteSetting.migrate_to_new_scheme = true
Jobs::MigrateUploadScheme.new.execute(nil)
root@shadrack-rbx:/var/www/discourse# rails c
[1] pry(main)> SiteSetting.migrate_to_new_scheme = true
=> true
[2] pry(main)> Jobs::MigrateUploadScheme.new.execute(nil)
=> []

So, you can hack around this problem by deleting the stuff that’s not in the new scheme, like this:

 bad=Upload.where("url not like '//%' and url not like '/uploads/default/original/_X/%'")
bad.delete_all

This seems like a :bug:. – If you rebuild, you’ll need to do it again, so there’s something in db/fixtures that needs to be fixed, I guess. Oh, or maybe just change:

to

  if Upload.where("id>0 and url NOT LIKE '//%' AND url NOT LIKE '/uploads/#{db}/original/_X/%'").exists?

I don’t know which is more correct or I might submit a PR.

But when I delete all of those uploads (before I figured out the above), I’m still getting this:

root@shadrack-rbx:/var/www/discourse# DISCOURSE_S3_CDN_URL="https://lc-rbx.ams3.cdn.digitaloceanspaces.com" rake uploads:migrate_to_s3
Migrating uploads to S3 for 'default'...
Please provide the 'DISCOURSE_S3_CDN_URL' environment variable

And this is in discourse.conf as well:

s3_cdn_url = 'https://lc-rbx.ams3.cdn.digitaloceanspaces.com'

I guess that’s because of

I changed it to GlobalSetting.s3_cdn_url.blank? and it’s running. @zogstrip. It looks like this is yours, but it’s 3 months old, so I guess there’s something else here I don’t understand.

I’ve got 165087, so I guess I’ll see what happens when that finished in a day or three.

Thanks for your help on this @tgxworld. The problem that I’m having now (uploading 73GB) is that at some point I hit rate limits and the job craps out. And it seems that it starts over from the beginning each time, always reporting that there are something like 187123 images to upload.

It’s not your problem, but I’d really like to get this move to spaces complete before I move this site to a new server Real Soon Now. Would you recommend that I just give in an go for AWS S3 or do you think this should work?

If it is urgent, I’ll recommend going with AWS S3 first since the rake task is used internally by us to migrate uploads to the S3 store. In theory, the rake task should work with Spaces as well but it wouldn’t be our top priority at the moment.

7 Likes

Indeed. It’s lots faster too. It’s an insane hassle to set up, but I think that spaces isn’t really ready for prime time.

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.