Migrating vBulletin 5 database - Import attachments errors about attachment extension

I’ve successfully imported my vBulletin 5 forum into Discourse. While the overall import script works fine, I’m encountering errors when importing attachments. The attachments are stored in my database and include the following extensions: java, html, jpg, png, txt, rtf, zip, js, and xml.

Upon debugging the import_attachment action, I discovered that only attachments with the java extension are being imported correctly. The script fails for attachments with other extensions.

Has anyone else in the community faced issues when importing attachments with these file extensions? Does anyone have insights on why the script might be failing with these particular file types?

Here’s a brief overview of the issue:

  • The first three files in my database have the java extension and are imported without problems.
  • The script fails when it encounters a file with the jpg extension.

image

@pfaffman Any advice or solutions would be greatly appreciated!

You’ll need to add some debugging put statements to see what it’s happening.

I added debugging statement on upl_obj

     begin
        upl_obj = create_upload(post.user.id, filename, real_filename)

        if upl_obj&.persisted?
          html = html_for_upload(upl_obj, real_filename)
          if !post.raw[html]
            post.raw += "\n\n#{html}\n\n"
            post.save!
            UploadReference.ensure_exist!(upload_ids: [upl_obj.id], target: post)
          end
        else
          puts "Failed to create upload for #{filename}: #{upl_obj.errors.full_messages.join(", ")}"
          next
        end
      rescue => e
        puts "Error processing file #{filename}: #{e.message}"
        next
      end

and see these errors

these errors are on extension jpg, jpeg, png, PNG, gif

any idea about these errors? @pfaffman

The only files that worked are ascii. My guess is an encoding error.

Is there an issue with files in database or import script encoding issue? @pfaffman


See these files if others are encoded then why these extensions e.g jpg, png etc are not?

My guess, which could be wrong, is that a problem with newline encoding makes the data in the binary files wrong because a newline character is encoded as data. If the only files that work are ascii, it’s a good bet.

So it’s not a Discourse issue, but a mysql issue.

1 Like

Almost - not newline encoding, but treated as text, and therefore corrupted.

EF BF BD is the UTF-8 byte sequence for ‘REPLACEMENT CHARACTER’ (U+FFFD). This is indicative of a file being treated as text instead of binary.

A JPEG image starts with ff d8 ff e0 xx xx 4a 46 49 46 00

You can see that the first four bytes have each been replaced with EF BF BD.

So your images are indeed corrupted. This is not a problem with the importer, this is a problem with the database, as @pfaffman already said. If you have copied this database from another server, you might want to check if this is already an issue in the original database. This could also only be happening on the oldest images (if this happened a long time ago). Just remove the exit line and see what happens.

1 Like

@RGJ Thanks for the help. I tried importing a new database with the correct images, and while it imported, not all attachments were fully imported. I’m encountering errors like this:

Any idea why this is happening? @RGJ @pfaffman

Posts are appearing like this, and there are many instances:

I don’t think the importer can handle those.

If I recall correctly, all [ATTACH] tags are removed from the posts since they are superfluous. That probably doesn’t work here because it does not expect JSON data in it. It would be matter of looking up the place where they are being removed and modifying that code to account for the JSON data inside of the tag.

Before importing attachments, I notice that posts with images contain [ATTACH] tags. After the import, some of these tags are correctly filled while others are left empty. Why is that?

Like I said, those tags are superfluous, but the tag removal logic does not expect that JSON code, so it is not removing them correctly.

None of those images are showing right?

I think some vBulletin attaches them by adding them to the database and some include bbcode like that. I think that I’ve modified the imported to handle those before.

Yeah these are not showing.
I got errors on 1k attachments out of 12.5k
How can I Uploaded those?

Oh, I didn’t notice the json before. Do you expect these json files to be attachments embedded into the posts? What do those posts look like in vBulletin?

I believe the other errors are because those posts were not imported for some reason (like the parent topic was deleted or otherwise not imported)

I don’t think these are json files, it’s json metadata.
It looks like vBulletin changed their encoding of attachment locations from

[attach]123[/attach]

to
[attach=json]{"data-attachmentid":123}[/attach]

and the importer cannot handle that. It should attach the attachments anyway, these tags are only for positioning them within the post. But the deletion of the tag only happens when they contain a numeric id.

A lot of other errors in the screenshot above are independent of this issue.

1 Like

I thought that I’d seen that sometimes the database linked them to the post and sometimes the bbcode did, and I guess sometimes they both do? And sometimes they live in the database and sometimes they are external files (but i might be remembering some other system on that).

Yes, that’s about correct. But AFAIK in vBulletin 5 it’s always via the database.

1 Like