ToS3MigrationError: uploads not migrated to S3, leading to broken attachments. How to fix this error?

FileStore::ToS3MigrationError: 182 of 21512 uploads are not migrated to S3. S3 migration failed for db ‘default’.

then some stack trace from ‘raise_or_log’, ‘migration_successful?’, ‘migrate_to_s3’, ‘migrate’, ‘block in migrate_to_s3_all_sites’

I ran into these errors twice when running the task uploads:migrate_to_s3 . About 1% of the files were not successfully migrated, leading to broken attachments on the site after a rebake. These all seem to be older files around the initial few months when I first created this Discourse instance.

In digging into the S3 bucket, the files do seem to have been successfully uploaded to S3, they are just not linked correctly after rebaking.

Rerunning the migration (prior to another rebake) seems to have fixed the issue for some reason. The same error still occurs during the migrate_to_s3 task.

However, if I run rebake again, it re-breaks the attachment links.

I don’t think these are errors, but I’ll include the output during the rebake

/var/www/discourse/lib/file_store/base_store.rb:6: warning: already initialized constant FileStore::BaseStore::UPLOAD_PATH_REGEX
/var/www/discourse/lib/file_store/base_store.rb:6: warning: previously definition of UPLOAD_PATH_REGEX was here
/var/www/discourse/lib/file_store/base_store.rb:7: warning: already initialized constant FileStore::BaseStore::OPTIMIZED_IMAGE_PATH_REGEX
/var/www/discourse/lib/file_store/base_store.rb:7: warning: previous definition of OPTIMIZED_IMAGE_PATH_REGEX was here

Running rake posts:missing_uploads or PostCustomField.where(name: Post::MISSING_UPLOADS) does not surface any issues, so this seems unrelated.

My problem might be related to this issue: Migrate_to_S3 Fails on Rebake - #12 by evenif

I will see what I learn and might close this and move convo over there.

The old broken attachments/uploads show up as
[{file name}|attachment](/uploads/default/original/2X/6/{sha1 hash + extension}) in the post’s raw text. This gets cooked as href="/uploads/default/original/2X/6/{sha1 hash + extension}", which breaks file attachments, but not images.

Later working attachments appear as [{file name}|attachment](upload://{sha1 to base62 + extension}) and get cooked as href="/uploads/short-url/{sha1 to base62 + extension}.

I wrote a little ruby code to go through all the posts in the know problematic time period and replaced all the old upload urls with the newer version. I utilized the Upload model’s base62_sha1 function to convert the sha1 hash to the expected file name that discourse short urls are expecting.

This made the attachments work. Then I ran another rebake just to confirm the fix, and it seems to still be working. It seems like the problem just resided in Post.raw and didn’t have anything to do with Upload.