How to avoid uploading attachments multiple times in a migration

(Jay Pfaffman) #1

The vbulletin importer currently includes only attachments that are included in [attach] tags in the text of the message. But sometimes there are attachments not included in the text, and they don’t get imported.

I’ve fixed this with a function that after posts are processed, reads all the attachments, uploads them, replaces the [attach] codes, and if the attachment was not mentioned in the text adds a link to it at the end of the post.

The problem is how to avoid re-uploading and including them with subsequent runs of the importer. My current solution is to put <div class="vbulletin-attachments-imported"></div> at the end of the post and to check for that string before handling attachments.

It works, but is there a more elegant way? (And, I suppose if I were more clever, I’d have a solution that did this on a per-attachment rather than per-post level, but it’s hard to imagine that such a feature would ever be used.)

(Felix Freiberger) #2

Why don’t you just re-upload the files and let Discourse figure this out? Since storage is based on an SHA checksum, this won’t duplicate the files :slight_smile:
Is this about making subsequent runs faster?

(Jay Pfaffman) #3

DOH! So that’s why those checksums are in the model?!?!?! :blush:

In that case, my problem is only making sure that I don’t add links to the un-linked attachments multiple times.

Thanks very much.