@RGJ likewise, I am sincere that if anyone knows with confidence how to avoid the boondoggle of rebuilding all the images, I’m all for it, and if I were an experienced Discourse developer and knew how to do it easily I’d be likely to have done that in the first place.
It’s my guess that migrating away from S3 is not a common case for those with lots of images, so it’s hard for it to be worth the time, relative to doing something to add more generally useful features to Discourse. I had approximately 100GB of uploads from importing from Google+ into MakerForums and we chose S3 with the idea that it was possible that lots of people would move from those communities to MakerForums, but in the end only a few communities moved actively, so continued growth didn’t support staying in S3 after all. I expect that’s a fairly extreme outlier, and reprocessing the images is a one-time cost.
Finally, I’m very grateful for you opening this topic in the first place, because otherwise I could very easily have missed non-post uploads.
…
@RGJ You are clearly right that destroy-and-recreate isn’t good. I added a check, raise "Error: upload url #{url} changed to #{new_upload.short_url}, should be unchanged." if url != new_upload.short_url
which at least reports problems with corrupted source. Today’s Digital Ocean Spaces outage cause it to send me corrupt data, and it tripped this error for many uploads — I don’t know how many. But because it had already destroyed the old upload, I now have some corrupted pages, and I didn’t realize it until I had lost the scrollback with the logs telling me which pages are corrupted, so I can’t even go in and fix them by hand from a backup.
So while my work is an improvement over the old way in that it at least reports some errors, it would be much better to download the files, check the SHA1s of the downloaded files, and write them to local files. In the meantime, I have modified my PR to stop the migration on any unhandled error to at least limit data loss.
I still think that my work should be merged as a pareto improvement, since it doesn’t make it worse and does make it better, but it would be rather better to just do it the right way. I think that would be to write lib/file_store/from_s3_migration.rb to parallel lib/file_store/to_s3_migration.rb but I’m not up to it.
Since my PR hasn’t been reviewed yet, I’ve piled more onto it. I added an uploads:report_missing_uploads
task that looks through raw
for instances of upload://...
where the upload object isn’t present. At least in my case, with backups, I’ll be able to look through my backups and find the orphaned files and restore them to the site, then rebake those posts to restore the images that went missing. I have found 678 of them to search for in backups, of which I have so far found all but 10, so I’m glad I wrote the test! I’ll need to deal with that before I can go into the phase, in which I rebake remaining posts.
…
I have completed my first phase of migration, not including rebaking affected posts with shared uploads, and not including non-post uploads. After I complete another remote backup, I plan to test out those features too and add them to my PR.
…
I have now started testing the next phases of migration, rebaking and non-post-upload migration. My first lesson was that rebaking as the next step fails due to non-post-upload avatars in quoted posts, so rebaking posts needs to be the last step.
Next discovery: Thank you foreign key constraints! I discovered while migrating non-post uploads that I need to migrate at least profiles before migrating non-post uploads generally.
ActiveRecord::InvalidForeignKey: PG::ForeignKeyViolation: ERROR: update or delete on table "uploads" violates foreign key constraint "fk_rails_1d362f2e97" on table "user_profiles"
Profile migration was tricky because I had to migrate two uploads attached to the profile, but in some cases people used the same image for both, so I had to do that work in three stages. I have successfully migrated all the profiles on my site with s3 urls.
I have migrated all of my non-post, non-profile uploads. @RGJ if you want to try out the branch I posted, it is up to date with my work. Everything in it has now been used for a completed site migration.
…
@CvX I don’t know when you’ll have a chance to review, but without my PR, migrate_from_s3
definitely is full of silent data destruction bugs. What I’ve done isn’t perfect, but it does protect against a lot of bugs that I’ve run into in practice.
I’ve done all the work I currently intend to do here. I have migrated my site, which means that I can’t even effectively evaluate any requested changes. If you reject this PR, I strongly suggest that you instead delete the existing migration because it will silently destroy data for users.
With respect to the PR, I sometimes get test failures in CI (as the current version) something like this:
7867 examples, 0 failures, 11 pending, 1 error occurred outside of examples
##[error]Process completed with exit code 1.
This doesn’t reproduce for me locally (so far) and I see no information in the CI logs to say what is wrong in CI, so I’m not going to waste any more of my time trying to screw around with exploring CI output for hidden information.
One last note: After completing the migration, we discovered that some users’ Profile Pictures were lost in the migration, and they are restoring them manually. I’m guessing that additional code would have been required to have succeeded in that, but since I didn’t figure that out until it was too late, I don’t have a test case to validate that. So that’s a remaining bug that might bite in some configurations, but I’m no longer in a position to write that additional fix.