Imported images missing thumbnails, getting tombstoned

(Jay Pfaffman) #1

Continuing the discussion from My journey into a massive posts rebake job:

So I’ve been working on this import for a community that really likes lots of really huge images. There are about 160GB of images.

After the import images were getting moved to tombstone. Also, even when the images are un-tombstoned and posts rebaked, they still aren’t getting thumbnails generated.

One thing that I’m wondering about is that embedded_image_html (in script/import_scripts/base/uploader.rb), generates

<img src="#{upload.url}" width="#{image_width}" height="#{image_height}"><br/>

rather than the “new style”


@bartv has noticed that there are no entries in (I think) post_uploads or optimized_images tables.

I’ve looked at the upload model and lib/upload_creator.rb. I can’t find what moves things to tombstone.

@zogstrip or @gerhard. . . Is it possible that including images with the HTML style image links is causing them to get tombstoned and/or not get thumbnails generated?

(Bart) #2

Correct; in addition, the attachments aren’t in the uploads table either; I only find avatar images there.

I haven’t spent too much time with the untombstone process as it’s very slow for our site - well over 30 seconds per attachment, and we have over 400k images in the tombstone directory. Instead of running untombstone, we moved all the files in the tombstone directory back to the original directory.

Rather than relying on such a fix, we’re aiming to get the data into the database correctly in the first place.

(Jay Pfaffman) #4

I’ve added a post.rebake! After the post.update in the import script. I don’t know if that’ll help.

Also, Bart’s copy of the database was taken before sidekiq had finished. It was meant to be a test that there wouldn’t be a problem with the restore. But my copy did leave sidekiq running. Could it be that images got tombstoned before sidekiq could process them?

(and if so, turning off having discobot send a message to every user imported is probably a good default.)

The real problem with the recover_from_tombstone rake task is that every image gets a linear search of the database.