Migrate_from_s3 problems

@zogstrip Do you mind reviewing this PR since you have context from recent review on my last one in this area? https://github.com/discourse/discourse/pull/10093

I’ve included in it the fixes I’ve been doing while running this relatively large migration. I haven’t tried to add tests for every fix; I’m not sure how to inject every form of error. But at least the new functionality is tested.

@RGJ I think that my PR as it stands now might deal with all but your first two points, except that I’m not sure about CDN. My site used CDN and migrated videos that had CDN urls but that might have been a side effect of the naming with discourse spaces. If you have additional cases, hopefully my PR will give you an easy scaffolding for adding to the regex and adding test cases for the additional variations.

I think that it is right to first migrate by post, because after migrating the uploads in a post the post needs to be rebaked so that the cooked post has the right URLs in it. After I finish migrating my posts (which might take less than two weeks now that I’ve changing my rate-limiting to check the queue length directly), I’ll be looking at any remaining work to do to clean up.

Because multiple posts can share references to the same content if more than one person upload the same file, there needs to be a second pass that checks the cooked data for old URLs and rebakes those posts to pick up new location. It can use the same rate-limiting to avoid soaking queues.

I probably won’t see any broken logos at makerforums because we tweaked branding after we stopped putting new content in “s3” (for us, digital ocean spaces) but I probably will see a bunch of uploads still in S3 for avatars at least. Migrating an uploads not associated with a post should be started only after all posts have been migrated, and I’ll probably have to write that up after I finish migrating uploads in posts.

@pfaffman I see Bizarre Problems with migrate_from_s3 - #5 which describes errors that didn’t repeat. Without my fixes in the current PR, errors are silently eaten, including validation failures. I think work here will address at least some of the problems you saw then.

@hosna the problems you raise in What does rake uploads:migrate_from_s3 exactly do? are either partially or completely solved so far in this PR. If they aren’t completely solved, I have added tests that will make it easier to add further tests to validate additional fixes.

@sam since you put the 2.6 label on the PR, I’m assuming it won’t get merged for at least a few days; should I pull my rate-limit feature work into the PR along with the fixes? Or do you have a preference to keep fixes and feature work in separate PRs? I can do either way. The rate limiting feature is working really well; I’m migrating about three times as fast, without site availability impact, now that I’m waiting for the sidekiq queue to drain, so it makes sense to pull it in, I think, if it’s something that is normally accepted in PRs. Otherwise, I need to wait the PR on merging the work it is based on, so either way it would be good to hear.

I DRYed out my migration rate limit patch and pulled it into the PR. It’s working in practice, and sar tells me that I’m seeing nearly zero idle time continuously, while the site continues to function, during the live migration. One benefit of the batch mode is that I can check for new Discourse versions after each full batch of migrations; I upgraded my site to 2.6.0beta1 at the first opportunity after it was cut, and have been running the migrations successfully on 2.6.0beta1 with my migration PR on top since the update.

I think that the PR is ready for review now; I’ll be planning to submit another PR for the last few stages, but getting this in place will improve the general migration experience for everyone even before I finish out the last pieces.

5 Likes