phpBB import problem

It’s not uncommon for duplicate filenames to exist in uploads; every board has to account for that. But in phpBB, it’s possible to upload two attachments of different pictures, both with the same name, in the same post. phpBB manages this by storing the actual filename as “real_filename” in the attachment table, but the hashed name it generates as “physical_filename”.

I have 46 posts converted from my phpBB forum that contain duplicate images, because the filenames are the same. In a few cases, it’s because the user inadvertently posted the same file twice. But in many, it’s not. I’m not sure how the user did it - but I have an example where they uploaded 11 different pictures in phpBB, each named “image.jpg” (a default filename when dragging a picture from an email to the desktop). The same picture appears 11 times in Discourse. It appears the converter identifies a phpBB attachment to upload by real_filename - open to duplicates - rather than physical_filename.

Fortunately, I have a ruby script that identifies posts in postgres that contain the same Discourse filename twice. It will be painful (particularly with the post with 11 duplicates!), but I can manually fix this since my phpBB board is still extant. But just to note as an important correction, since I anticipate many phpBB webmasters will be jumping ship as I plan to do.

Thanks
Dan

1 Like

FWIW, Claude says the issue is at line 24 in
/var/www/discourse/script/import_scripts/phpbb3/importers/attachment_importer.rb.

The bug: When multiple phpBB attachments have the same real_filename (e.g., “IMG_1234.jpg”), the uploader sees the filename already exists and returns the existing upload instead of creating a new one, even though the physical_filename (actual file) is different.

The fix would be: Use physical_filename as the filename, or append attach_id to make it unique.

Instead of
filename = CGI.unescapeHTML(row[:real_filename])

Use:
filename = “#{row[:attach_id]}_#{CGI.unescapeHTML(row[:real_filename])}”

That doesn’t sound like a bad idea.