Rake uploads:migrate_from_s3 fails

@pnoeric since you are concerned about site uptime, I thought I’d pass on to you what I’ve learned so far.

I did my migration live, as I mentioned. If I don’t rate-limit the migration, the queues that do things like notify users of each others’ activity get clogged up and the user experience of the site is diminished.

I migrated about 500 posts with videos and about 30K posts with images, which took about two weeks to copmlete.

If you want to try the code I used, it’s currently at
https://github.com/johnsonm/discourse/blob/mkj-fix-more-urls/lib/tasks/uploads.rake
you can download it and copy it into your app to replace the current contents of lib/tasks/uploads.rake

With this code, you can do something like this:

bin/rake uploads:batch_migrate_from_s3[100,1000]

That will consider only 1000 total posts with uploads, and migrate files from a maximum of 100, before stopping; every time it actually modifies a post after migrating its uploads it will wait until the queue is empty before starting the next one.

If you copy the file in, it will break future site updates until you undo the change. The easiest way to undo it after you are satisfied is just ./launcher rebuild app (although as a developer I use git checkout HEAD lib/tasks/uploads.rake to undo my changes…)

I have noticed that at least with digital ocean spaces, sometimes I have to retry a few times before a migration succeeds. The script as it stands now doesn’t give you any warning when that happens, and you just have to keep running it and waiting to see. I do have a PR waiting for review that prints out errors in that case so that you at least know that something went wrong.

I’ve added a simple short retry loop, as well as the error message, and it appears that the retry loop resolves the problem. Also, validation against current rules was being done on past post raw content which could break the migration and silently leave posts that needed to be rebaked; I have also fixed that. You will definitely not want to do a migration without getting at least the validation fix, which is one of the commits in my PR currently up for review.

I have finished my migration, to the best of my knowledge. My PR has all the code that I used to complete my migration. It hasn’t been reviewed. I’d suggest following along at Migrate_from_s3 problems if you want.

2 Likes