(Not sure if this is specific to the phpBB importer, or something that could affect all of them. I only have knowledge/experience of the phpBB importer, so I thought I'd mention it here first.)
- If the site-wide setting clean orphan uploads grace period hours is shorter than the time it takes to do the import, you can wind up losing a lot of attachments and avatar images in the process.
That's my theory for what happened, at least. I reduced the grace period from the default 48 hours to just 4 hours (figuring it'd be enough, oops!). Then I ran the import, which took about 10 hours with our hardware and amount of data.
The next day we found a huge number of posts with missing attachments, and users with broken avatars, with thousands of files in the uploads/tombstone directory.
I think what happens is the import script creates all the attachments at the start (very quick), then builds the posts (can take many hours). There's also the long sidekiq processing done once the import script is done and the forum restarts, which may be the real culprit. (If so, this probably does affect other importers.)
If the background task that looks for orphaned attachments kicks off before all the posts are in the database, all the attachments that are for pending posts will look orphaned, and they get deleted if they're older than the grace period. Then the posts are added, and their attachments are broken.
Mea culpa on my part for messing with a setting before doing the import instead of afterwards. That was silly of me; imports are complex and it's best to do them under vanilla conditions, then start changing things. I've repaired everything now (and learned a lot more SQL and Ruby in the process!). But I wanted to feed back my theory in case it can help avoid the same happening to someone else.