Rake uploads:migrate_from_s3 fails

I followed the steps here, backed up my entire site, cloned my AWS S3 bucket, changed the bucket name in Discourse settings from the original bucket to the backup, and turned off “uploads to S3” checkbox in settings.

So now I’m finally ready to start the migration off S3… and it fails. :frowning:

The error message

root@ubuntu:/var/www/discourse# rake uploads:migrate_from_s3
Migrating uploads from S3 to local storage for 'default'...
rake aborted!
NoMethodError: undefined method `downcase' for nil:NilClass
/var/www/discourse/app/models/global_setting.rb:107:in `s3_bucket_name'
/var/www/discourse/app/models/site_setting.rb:157:in `absolute_base_url'
/var/www/discourse/lib/tasks/uploads.rake:138:in `migrate_from_s3'
/var/www/discourse/lib/tasks/uploads.rake:118:in `block in migrate_all_from_s3'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rails_multisite-2.2.2/lib/rails_multisite/connection_management.rb:68:in `with_connection'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rails_multisite-2.2.2/lib/rails_multisite/connection_management.rb:78:in `each_connection'
/var/www/discourse/lib/tasks/uploads.rake:118:in `migrate_all_from_s3'
/var/www/discourse/lib/tasks/uploads.rake:93:in `block in <top (required)>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'
Tasks: TOP => uploads:migrate_from_s3
(See full trace by running task with --trace)

(Here’s the line in github where it fails– I guess it can’t get the value of s3_bucket?)

Other things I tried

  • I tried adding the credentials to the command line, but that didn’t make a difference. i.e.
    DISCOURSE_S3_BUCKET="dn-forum-storage-backup" DISCOURSE_S3_REGION="us-east-1" DISCOURSE_S3_ACCESS_KEY_ID="xxxxxxxxxxxxxxxxxxxx" DISCOURSE_S3_SECRET_ACCESS_KEY="xxxxxxxxxxxxxxxxxxxx" DISCOURSE_S3_CDN_URL="https://dn-forum-storage-backup.s3.us-east-1.amazonaws.com" rake uploads:migrate_from_s3

  • I also tried changing the S3 bucket name in my settings back to the original bucket name, still no luck, same result.

  • I also tried rebuilding the app. Same result.

@vinothkannans do you know what’s going on?

Please help, Discourse friends!

p.s. small side note: rake --tasks doesn’t list this task or any tasks that start with uploads, not sure if that means anything.

Possible related issue? cc @mcdanlj

@pnoeric yes that looks like exactly the same thing. I haven’t heard back in that issue on precisely what the intent is for SiteSettings vs. GlobalSettings for S3, so I don’t have any more help to give right now than to add it to SiteSettings via configuration (point 1 in my post).

Hey, thanks for this reply… I’m not even sure what SiteSettings vs GlobalSettings means – I’m not that good a RoR coder and don’t understand the full setup well enough. I’m just following the basic instructions. :wink:

But hopefully @vinothkannans will jump in too; I think he wrote the code for the migrate tasks. Or anyone else from the @team that might know…

Let’s keep our eyes on this topic…

s3_bucket is in GlobalSettings which is set from the config/discourse.conf file normally created from environment variables in the app.yml file. SiteSettings are things you change from the Admin Settings in the app.

It looks like when this was first created, you could only change S3 by rebuilding the app, and more recently it became possible to fill in the data in the Admin Settings. I can’t tell what was intended by not just doing a wholesale migration when setting S3 in SiteSettings was added.

[Edit: I inadvertently reversed the two when first posting this response]

1 Like

Hmmm @mcdanlj just to confirm, though, you haven’t been able to work out how to make the migrate_from_s3 actually work, correct?

I’m fine with editing whatever settings or low-level files or whatnot that is necessary… I just need to move off S3 ASAP since it’s costing me an arm and a leg.

Hi @pnoeric and @mcdanlj,

One debugging approach might be to enter into the rails console and then take a peek at the s2 site settings?

For example, in a plain ole OOTB discourse docker single container standalone app:

root@localhost:~# docker exec -it app rails c
[1] pry(main)> SiteSetting.s3_upload_bucket
=> ""
[2] pry(main)> SiteSetting.enable_s3_uploads
=> false
[3] pry(main)> 

comparing the SiteSettings (via the rails console) against the defaults, listed here:

https://github.com/discourse/discourse/blob/master/config/site_settings.yml

Perhaps debugging in this manner might be helpful (no idea really, as we don’t use AWS or S3)? Maybe the rails console might help a bit?

1 Like

the rake --tasks only shows tasks that have a description. You can view all available tasks through rake -AT

I don’t think it’ll help because I very recently had run these tasks on a test site. Both seem to depend on S3 variables being defined in the env however, it was a couple months ago and migrate_from_s3 didn’t really work for me.

1 Like

Tricky question. I did set s3_bucket in config/discourse.conf as mentioned in the post you linked to, which did resolve this particular error, as I noted there.

This file is inside the container (./launcher enter app). Note that for this to survive ./launcher rebuild app you need also to add DISCOURSE_S3_BUCKET to the env section of your containers/app.yml file.

The fact that I fixed it was why it was a dev post, not a support request; I was asking what developers think is the right solution as I continue to hack at this.

I have about 100GB of files in S3 so I’m moving very carefully. I implemented a limit for posts to look at, and I next need to implement a limit for posts to modify. I’ve been trying one thing at a time. The fact that this appears to be rarely-used code and I’ve seen this error repeatedly makes me concerned about code rot and I don’t want to suddenly deface my entire site due to a bug, and this looks like it could be a good way to make that mistake.

  • For upload:// (for me, this means non-video) uploads, so far, it appears to be working. I’m doing one at a time and then reviewing the affected post to make sure everything works.

  • For uploads that don’t use the upload:// syntax (for me, this means video uploads as far as I can tell), where there is a literal reference to the URL in S3, it is mangling the URLs. That’s not a hard bug to fix as soon as I figure out for sure what I’m supposed to change them to but I haven’t done that yet. So that’s likely to be one of the PRs I post soon.

This is a spare time project for me, so no promises on timing.

1 Like

Aha, thank you! I’ll give it a crack.

Ugh so no luck, @neounix @mcdanlj @vinothkannans. Still failing. But at least there’s a new/different error message…

Here’s what I tried today:

  1. Upgrade to latest Discourse, just to be sure.

  2. Add my s3_bucket in config/discourse.conf

  3. ./launcher enter app

  4. Edit containers/app.yml and added DISCOURSE_S3_BUCKET var

  5. Tried rake uploads:migrate_from_s3 and now it fails with a new error message (before it was downcase causing the problem, now it appears to be start_with?):

/var/www/discourse# rake uploads:migrate_from_s3
Migrating uploads from S3 to local storage for 'default'...
rake aborted!
NoMethodError: undefined method `start_with?' for nil:NilClass
/var/www/discourse/app/models/site_setting.rb:161:in `absolute_base_url'
/var/www/discourse/lib/tasks/uploads.rake:138:in `migrate_from_s3'
/var/www/discourse/lib/tasks/uploads.rake:118:in `block in migrate_all_from_s3'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rails_multisite-2.3.0/lib/rails_multisite/connection_management.rb:68:in `with_connection'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rails_multisite-2.3.0/lib/rails_multisite/connection_management.rb:78:in `each_connection'
/var/www/discourse/lib/tasks/uploads.rake:118:in `migrate_all_from_s3'
/var/www/discourse/lib/tasks/uploads.rake:93:in `block in <main>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'
Tasks: TOP => uploads:migrate_from_s3
(See full trace by running task with --trace)
  1. So then I tried ./launcher rebuild app

  2. And again ./launcher enter app, rake uploads:migrate_from_s3

Same problem exactly:

/var/www/discourse# rake uploads:migrate_from_s3
Migrating uploads from S3 to local storage for 'default'...
rake aborted!
NoMethodError: undefined method `start_with?' for nil:NilClass
/var/www/discourse/app/models/site_setting.rb:161:in `absolute_base_url'
/var/www/discourse/lib/tasks/uploads.rake:138:in `migrate_from_s3'
/var/www/discourse/lib/tasks/uploads.rake:118:in `block in migrate_all_from_s3'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rails_multisite-2.3.0/lib/rails_multisite/connection_management.rb:68:in `with_connection'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rails_multisite-2.3.0/lib/rails_multisite/connection_management.rb:78:in `each_connection'
/var/www/discourse/lib/tasks/uploads.rake:118:in `migrate_all_from_s3'
/var/www/discourse/lib/tasks/uploads.rake:93:in `block in <main>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'
Tasks: TOP => uploads:migrate_from_s3
(See full trace by running task with --trace)

Any other ideas?

Doing this process is a real drag btw-- I have to pre-schedule and announce the forum outage days in advance, then on the day of, change the main site so folks can’t get into the forum, and then I have to shut down the forum server on Dig Ocean and take a snapshot before proceeding. That’s ~30 minutes right there. Then I start it up again and then I can try the steps above. I am SO regretting setting up Amazon S3 for media storage! I’ve burned hours trying to undo that choice and still no luck (and still a big $$$ bill from Amazon each month). I’d love to get to the bottom of this. How can I help?

That line is:

        if SiteSetting.Upload.s3_region.start_with?("cn-")

Looks like it wants s3_region also; not clear to my why I didn’t run into that.

I’m not sure I follow your logic; my own migration of ~100GB of content I plan to do live, after a normal site backup. But I’m starting small, which is why I’ve been working on limiting the amount migrated at once. One warning: the code seems wrong for literal URL translations, as I see for video uploads, so if you allowed video uploads you might have a problem there with the code in its current state.

2 Likes

So maybe I should repeat all the steps I did above, but I’ll put s3_bucket, s3_region, s3_cnd_url, s3_secret_access_key etc. (basically, every variable I have) into the conf and yml files? I’d rather give it more than it (maybe) needs, just so the thing will actually work.

I saw where someone on the Discourse team had suggested to back up the entire local site before starting this transition. Which requires me to take my Digital Ocean server offline. :frowning:

Right. I’m starting small too… every time I try I am migrating 0 files. :grin:

Luckily members are only allowed to upload JPG, GIF and PNG in my forum so it should be ok.

Fingers crossed.

Backup and snapshot are not the same. A snapshot is the crudest form of backup. The Admin console has a backup facility. Make sure that you configure it to back up thumbnails in the configuration first.

Now that you know that you don’t have to take your site down, you should be able to relax. You can use batch_migrate_from_s3 to migrate at most a certain number of uploads. Right now it limits the posts that are considered rather than the migrations done, a bug for me to resolve in a future PR. But I need to also resolve the video upload bug, and I’d like to consider printing feedback because one of the points of the limit is to be able to confirm in affected posts that the migration was successful.

I’m likely to do this all over the next 1-2 months, so if you want to wait on that it might be worth paying a few more months of S3, up to you and I’m not making promises just stating intent.

2 Likes

@pnoeric since you are concerned about site uptime, I thought I’d pass on to you what I’ve learned so far.

I did my migration live, as I mentioned. If I don’t rate-limit the migration, the queues that do things like notify users of each others’ activity get clogged up and the user experience of the site is diminished.

I migrated about 500 posts with videos and about 30K posts with images, which took about two weeks to copmlete.

If you want to try the code I used, it’s currently at
https://github.com/johnsonm/discourse/blob/mkj-fix-more-urls/lib/tasks/uploads.rake
you can download it and copy it into your app to replace the current contents of lib/tasks/uploads.rake

With this code, you can do something like this:

bin/rake uploads:batch_migrate_from_s3[100,1000]

That will consider only 1000 total posts with uploads, and migrate files from a maximum of 100, before stopping; every time it actually modifies a post after migrating its uploads it will wait until the queue is empty before starting the next one.

If you copy the file in, it will break future site updates until you undo the change. The easiest way to undo it after you are satisfied is just ./launcher rebuild app (although as a developer I use git checkout HEAD lib/tasks/uploads.rake to undo my changes…)

I have noticed that at least with digital ocean spaces, sometimes I have to retry a few times before a migration succeeds. The script as it stands now doesn’t give you any warning when that happens, and you just have to keep running it and waiting to see. I do have a PR waiting for review that prints out errors in that case so that you at least know that something went wrong.

I’ve added a simple short retry loop, as well as the error message, and it appears that the retry loop resolves the problem. Also, validation against current rules was being done on past post raw content which could break the migration and silently leave posts that needed to be rebaked; I have also fixed that. You will definitely not want to do a migration without getting at least the validation fix, which is one of the commits in my PR currently up for review.

I have finished my migration, to the best of my knowledge. My PR has all the code that I used to complete my migration. It hasn’t been reviewed. I’d suggest following along at Migrate_from_s3 problems if you want.

2 Likes

Thank you! I’m going to give this a whirl in the next few days.

I just added a note in that post about there being one bug left that we discovered today that Profile Pictures have gone missing for some users, and I don’t know why. We shrugged and have been asking affected users to restore, with apologies for the problem.

I definitely did frequent backups during this process! :slight_smile:

Best of luck!

1 Like

Seriously, could this make me more crazy? :crazy_face:

Here’s what I did:

  1. Copied over your new lib/tasks/upload.rake code into my Discourse
  2. Added ALL of my Amazon s3_ variables to config/discourse.conf
  3. Also added them to app.yml (unclear if that does anything, but why not)
  4. Ran this command and got…
root@:/var/www/discourse/config# rake uploads:batch_migrate_from_s3[100,1000]
You must disable S3 uploads before running that task.

And confirmed:

So, ok. I edited the uploads.rake file and just removed that check.

Now I get:

root@:/var/www/discourse/lib/tasks# rake uploads:batch_migrate_from_s3[100,1000]
Migrating uploads from S3 to local storage for 'default'...
Migrating up to 100 of 1000 posts...
... (lots of output here) ...
Modified 91/100: 28795: 28486/1 - https://example.com/t/topic-title-here/28486/1
... (lots of output here) ...

So it appeared to be working! Yay!

After it did that first batch of 100, I checked sidekiq and I saw my test post was queued so I waited for that to finish…

…then went back and checked… and that post is still pulling its image from Amazon S3. :frowning: I tried “Rebuild HTML” on the post and that didn’t change it.

So then I tried the whole process again, from the rake all the way through, and got the same results-- the same 100 posts were processed, the same things queued in sidekiq, and after letting it run, the image in that test post still coming from S3.

Hmmm, I’m not sure what to try next. :man_shrugging:t2:

@mcdanlj appreciate any suggestions or advice you might have :wink:

1 Like

That’s exactly what I would expect if you remove that check. I’m not sure why you decided to remove it. It’s on purpose. Turn off uploads to S3 before starting the migration.

1 Like

They were off-- completely off. (The picture of the checkbox in my post is the right setting, correct?) I even turned them on and back off. No go.