Rake uploads:migrate_from_s3 fails

Aha, thank you! I’ll give it a crack.

Ugh so no luck, @neounix @mcdanlj @vinothkannans. Still failing. But at least there’s a new/different error message…

Here’s what I tried today:

  1. Upgrade to latest Discourse, just to be sure.

  2. Add my s3_bucket in config/discourse.conf

  3. ./launcher enter app

  4. Edit containers/app.yml and added DISCOURSE_S3_BUCKET var

  5. Tried rake uploads:migrate_from_s3 and now it fails with a new error message (before it was downcase causing the problem, now it appears to be start_with?):

/var/www/discourse# rake uploads:migrate_from_s3
Migrating uploads from S3 to local storage for 'default'...
rake aborted!
NoMethodError: undefined method `start_with?' for nil:NilClass
/var/www/discourse/app/models/site_setting.rb:161:in `absolute_base_url'
/var/www/discourse/lib/tasks/uploads.rake:138:in `migrate_from_s3'
/var/www/discourse/lib/tasks/uploads.rake:118:in `block in migrate_all_from_s3'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rails_multisite-2.3.0/lib/rails_multisite/connection_management.rb:68:in `with_connection'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rails_multisite-2.3.0/lib/rails_multisite/connection_management.rb:78:in `each_connection'
/var/www/discourse/lib/tasks/uploads.rake:118:in `migrate_all_from_s3'
/var/www/discourse/lib/tasks/uploads.rake:93:in `block in <main>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'
Tasks: TOP => uploads:migrate_from_s3
(See full trace by running task with --trace)
  1. So then I tried ./launcher rebuild app

  2. And again ./launcher enter app, rake uploads:migrate_from_s3

Same problem exactly:

/var/www/discourse# rake uploads:migrate_from_s3
Migrating uploads from S3 to local storage for 'default'...
rake aborted!
NoMethodError: undefined method `start_with?' for nil:NilClass
/var/www/discourse/app/models/site_setting.rb:161:in `absolute_base_url'
/var/www/discourse/lib/tasks/uploads.rake:138:in `migrate_from_s3'
/var/www/discourse/lib/tasks/uploads.rake:118:in `block in migrate_all_from_s3'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rails_multisite-2.3.0/lib/rails_multisite/connection_management.rb:68:in `with_connection'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rails_multisite-2.3.0/lib/rails_multisite/connection_management.rb:78:in `each_connection'
/var/www/discourse/lib/tasks/uploads.rake:118:in `migrate_all_from_s3'
/var/www/discourse/lib/tasks/uploads.rake:93:in `block in <main>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'
Tasks: TOP => uploads:migrate_from_s3
(See full trace by running task with --trace)

Any other ideas?

Doing this process is a real drag btw-- I have to pre-schedule and announce the forum outage days in advance, then on the day of, change the main site so folks can’t get into the forum, and then I have to shut down the forum server on Dig Ocean and take a snapshot before proceeding. That’s ~30 minutes right there. Then I start it up again and then I can try the steps above. I am SO regretting setting up Amazon S3 for media storage! I’ve burned hours trying to undo that choice and still no luck (and still a big $$$ bill from Amazon each month). I’d love to get to the bottom of this. How can I help?

That line is:

        if SiteSetting.Upload.s3_region.start_with?("cn-")

Looks like it wants s3_region also; not clear to my why I didn’t run into that.

I’m not sure I follow your logic; my own migration of ~100GB of content I plan to do live, after a normal site backup. But I’m starting small, which is why I’ve been working on limiting the amount migrated at once. One warning: the code seems wrong for literal URL translations, as I see for video uploads, so if you allowed video uploads you might have a problem there with the code in its current state.

2 Likes

So maybe I should repeat all the steps I did above, but I’ll put s3_bucket, s3_region, s3_cnd_url, s3_secret_access_key etc. (basically, every variable I have) into the conf and yml files? I’d rather give it more than it (maybe) needs, just so the thing will actually work.

I saw where someone on the Discourse team had suggested to back up the entire local site before starting this transition. Which requires me to take my Digital Ocean server offline. :frowning:

Right. I’m starting small too… every time I try I am migrating 0 files. :grin:

Luckily members are only allowed to upload JPG, GIF and PNG in my forum so it should be ok.

Fingers crossed.

Backup and snapshot are not the same. A snapshot is the crudest form of backup. The Admin console has a backup facility. Make sure that you configure it to back up thumbnails in the configuration first.

Now that you know that you don’t have to take your site down, you should be able to relax. You can use batch_migrate_from_s3 to migrate at most a certain number of uploads. Right now it limits the posts that are considered rather than the migrations done, a bug for me to resolve in a future PR. But I need to also resolve the video upload bug, and I’d like to consider printing feedback because one of the points of the limit is to be able to confirm in affected posts that the migration was successful.

I’m likely to do this all over the next 1-2 months, so if you want to wait on that it might be worth paying a few more months of S3, up to you and I’m not making promises just stating intent.

2 Likes

@pnoeric since you are concerned about site uptime, I thought I’d pass on to you what I’ve learned so far.

I did my migration live, as I mentioned. If I don’t rate-limit the migration, the queues that do things like notify users of each others’ activity get clogged up and the user experience of the site is diminished.

I migrated about 500 posts with videos and about 30K posts with images, which took about two weeks to copmlete.

If you want to try the code I used, it’s currently at


you can download it and copy it into your app to replace the current contents of lib/tasks/uploads.rake

With this code, you can do something like this:

bin/rake uploads:batch_migrate_from_s3[100,1000]

That will consider only 1000 total posts with uploads, and migrate files from a maximum of 100, before stopping; every time it actually modifies a post after migrating its uploads it will wait until the queue is empty before starting the next one.

If you copy the file in, it will break future site updates until you undo the change. The easiest way to undo it after you are satisfied is just ./launcher rebuild app (although as a developer I use git checkout HEAD lib/tasks/uploads.rake to undo my changes…)

I have noticed that at least with digital ocean spaces, sometimes I have to retry a few times before a migration succeeds. The script as it stands now doesn’t give you any warning when that happens, and you just have to keep running it and waiting to see. I do have a PR waiting for review that prints out errors in that case so that you at least know that something went wrong.

I’ve added a simple short retry loop, as well as the error message, and it appears that the retry loop resolves the problem. Also, validation against current rules was being done on past post raw content which could break the migration and silently leave posts that needed to be rebaked; I have also fixed that. You will definitely not want to do a migration without getting at least the validation fix, which is one of the commits in my PR currently up for review.

I have finished my migration, to the best of my knowledge. My PR has all the code that I used to complete my migration. It hasn’t been reviewed. I’d suggest following along at Migrate_from_s3 problems if you want.

2 Likes

Thank you! I’m going to give this a whirl in the next few days.

I just added a note in that post about there being one bug left that we discovered today that Profile Pictures have gone missing for some users, and I don’t know why. We shrugged and have been asking affected users to restore, with apologies for the problem.

I definitely did frequent backups during this process! :slight_smile:

Best of luck!

1 Like

Seriously, could this make me more crazy? :crazy_face:

Here’s what I did:

  1. Copied over your new lib/tasks/upload.rake code into my Discourse
  2. Added ALL of my Amazon s3_ variables to config/discourse.conf
  3. Also added them to app.yml (unclear if that does anything, but why not)
  4. Ran this command and got…
root@:/var/www/discourse/config# rake uploads:batch_migrate_from_s3[100,1000]
You must disable S3 uploads before running that task.

And confirmed:

So, ok. I edited the uploads.rake file and just removed that check.

Now I get:

root@:/var/www/discourse/lib/tasks# rake uploads:batch_migrate_from_s3[100,1000]
Migrating uploads from S3 to local storage for 'default'...
Migrating up to 100 of 1000 posts...
... (lots of output here) ...
Modified 91/100: 28795: 28486/1 - https://example.com/t/topic-title-here/28486/1
... (lots of output here) ...

So it appeared to be working! Yay!

After it did that first batch of 100, I checked sidekiq and I saw my test post was queued so I waited for that to finish…

…then went back and checked… and that post is still pulling its image from Amazon S3. :frowning: I tried “Rebuild HTML” on the post and that didn’t change it.

So then I tried the whole process again, from the rake all the way through, and got the same results-- the same 100 posts were processed, the same things queued in sidekiq, and after letting it run, the image in that test post still coming from S3.

Hmmm, I’m not sure what to try next. :man_shrugging:t2:

@mcdanlj appreciate any suggestions or advice you might have :wink:

That’s exactly what I would expect if you remove that check. I’m not sure why you decided to remove it. It’s on purpose. Turn off uploads to S3 before starting the migration.

1 Like

They were off-- completely off. (The picture of the checkbox in my post is the right setting, correct?) I even turned them on and back off. No go.

Yes, but you over-rode it in your point #2 so by doing that you are back to putting files in S3.

2 Likes

The app.yml file settings are used only when rebuilding the container and are used to populate the config/discourse.conf in the container being built.

1 Like

Ahh! So if I have any s3_ vars in my conf/discourse.conf file, Discourse thinks that I have S3 uploads enabled? Ok… got it… I’ll remove that and give it another crack.

Probably there’s only one that matters… I’m not at my computer to check what all could cause that to trip, but if that check trips, I would expect your site to be uploading all new uploads to S3.

1 Like

It works, it works it works itworksitworks! Yay!!! Thank you so much for all your work and for patiently explaining what I needed to do. It’s great to get this underway.

So a few quick questions just so I can understand what’s going on better…

I’ve been running a batch of 100 at a time. The first few cycles, the tool was migrating posts (?) but now it’s just copying images. I have thousands and thousands of posts and each one usually only has one image attached… so I’m not sure why/how it “ran out” of posts to migrate? What does that mean, to migrate a post? So that’s question 1, what is happening exactly? :wink:

Question 2 is more simple. Now when I run it just copies images, but I notice that 2X images are being copied into 3X locally:

Downloaded 27/100: //my-forum-storage.s3.dualstack.us-east-1.amazonaws.com/original/2X/1/14c56ef9f1dddb7b7a6f14e920234e0f714ea699.jpeg to /uploads/default/original/3X/1/4/14c56ef9f1dddb7b7a6f14e920234e0f714ea699.jpeg

Note the URL is changing from 2X/1/xyz.jpeg to 3X/1/4/xyz.jpeg (extra folder path in there)

Is that all ok?

Finally, I’ve been spot checking the images and they seem fine, but since I have no idea what post that image is associated with, I can’t really do a “live” check on the forum and be 100% sure that my users are seeing the right thing. How can I map the jpeg image filename to a forum post?

1 Like

Exactly what commands were you typing? You’ll need to expand the limit to cover all your posts.

So I ended up specifying a large limit once I was sure I was going on to the next phase, something like this:

bin/rake uploads:batch_migrate_from_s3[100,100000]

It migrates first uploads that it finds in posts; there isn’t referential integrity from post.raw references to an upload and the upload object in the database. It looks for posts with any references to plain URLs or upload:// pseudo-protocol URLs that represent remote content, where it needs to at least re-cook the post after migrating, if not save a modification as for base URLs directly pointing to S3. When it can’t find any content in a post that needs to be migrated, it goes on to the next thing.

The path changes aren’t something driven directly from the migration script, it’s just changes to where discourse puts the files. One more directory level helps with performance when you have a lot of files, and lots of other systems use two and occasionally three levels of directories to spread the files out.

You may run into the same problem I did where migrating the other uploads caused Discourse to lose track of users’ Profile Pictures. I didn’t notice that until it was way too late to restore from backup, so I just messaged affected users and apologized and asked them to fix their avatars. I still don’t know what weny wrong there.

Ah ok, cool. I see this in the progress output:

All Post uploads migrated.  Migrating profile uploads...
All profiles uploads migrated.  Migrating other non-post uploads...

What are “non-post uploads”? This is where all the work seems to be remaining-- migrating posts and profile uploads don’t do anything.

Thank you for the reassurance about paths for the raw uploads!

I’m not sure if that’s happening or not but it’s ok-- my site uses SSO so I pass the avatar URL every time the user logs in (or changes their picture on the main site).

Honestly, I don’t know what the rest of them were. I rather hoped that if referential integrity were a concern, there would be a foreign key constraint, and that otherwise it would just be looking up the upload and using wherever it was. I certainly did run into foreign key constraints as an indicator of needing to do something special for two uploads attached to user profiles.

1 Like

Hmmm it does seem to be working… though I ran into a serious problem this morning… suddenly my disk usage spiked 25% and filled the drive, crashing the forums completely.

Right now, when I’m doing the rake, it seems like it’s downloading/uploading the images right during the batch run. I enter the rake command, it processes 300 images exactly (I’m running them in batches of 300) and then it ends.

So the important question is: could just moving those download from S3 to the local disk be queued? Could that have built up any kind of batch that then happened at 5am and brought down my forums? :frowning: