Migrate a vBulletin 4 forum to Discourse

Yes, they get converted. I will email you tomorrow!

3 Likes

You have to reach trust level 1 to unlock PMs, and you havenā€™t spent enough time reading topics here to achieve that yet.

4 Likes

Hi, Iā€™m on Docker container (DigitalOcean) and Iā€™m trying to import mybb (I know this tutorial is for Vbulletin but Gemfile thing is similar). Iā€™m stuck to bundle install:

Hello,
We tried to import vbulletin v3.8 to Discourse. This script work fine, with database 300Mb, around 40k user and 60k post. But at the end of this import process we faced the problem with charset.

  • our vBulletin 3.8 encoded with charset: latin1
    ----> when process import script MySQL 5.6 on Discourse docker also config with charset UTF-8,
  • The import script force the import process convert data to UTF-8,
    so at the end of import process Discourse forum display data as UTF-8 encode error. Itā€™s look like below picture.
  1. before import, vB 3.8

  2. after import to Discourse

We tried:

  • Convert charset on vB 3.8 to UTF-8 before run import script
  • Tested this vb 3.8 database on new one Mysql server the text display normal no encoded error happen.
    So, could you have any advice in this case ?

Appreciate for any support about this (Also very sorry about my English if you hard to understand)

1 Like

Hereā€™s a piece of how I fixed a similar problem:

    ### WIN1252 encoding
    win_encoded = ''
    begin
      win_encoded = raw.force_encoding('utf-8').encode("Windows-1252",
                            invalid: :replace, undef: :replace, replace: ""
                           ).force_encoding('utf-8').scrub
    rescue => e
      puts "\n#{'-'*50}\nWin1252 failed for \n\n#{raw}\n\n"
      win_encoded = ''
    end
    raw = win_encoded
5 Likes

You save my life.
For the easy way, I tried the convert script on post Importing from phpBB3 , help me fix my database charset problem quickly, and it work like charm now.
Thank you so much for advise

3 Likes

Did anyone migrate using the vbulletin5 importer? I may use it in the future, Iā€™d like to know if it already has been used flawlessly.

2 Likes

I just did an import of vBulletin5 and added some features (permalinks, some formatting, and maybe some other things I donā€™t remember). I intend to submit a PR, but it hasnā€™t happened yet.

3 Likes

I have a vb5 database dump that contains attachments in it. Can I import them in Discourse or do I need to have all the attachments as files?

Also confused about this. Where should I copy the attachment files in the discourse folder exactly? :thinking:

2 Likes

Hi again,
From what I understand, attachments from the database will work as they seem to be handled the same way as avatars which are in the database as well.

My import is going well, but I ran an arror at 91% of the imported posts :weary:

importing posts...
  1425149 / 1564573 ( 91.1%)  [1040 items/min]  Traceback (most recent call last):
        14: from script/import_scripts/vbulletin5.rb:726:in `<main>'
        13: from /home/canapin/discourse/script/import_scripts/base.rb:47:in `perform'
        12: from script/import_scripts/vbulletin5.rb:49:in `execute'
        11: from script/import_scripts/vbulletin5.rb:300:in `import_posts'
        10: from /home/canapin/discourse/script/import_scripts/base.rb:862:in `batches'
         9: from /home/canapin/discourse/script/import_scripts/base.rb:862:in `loop'
         8: from /home/canapin/discourse/script/import_scripts/base.rb:863:in `block in batches'
         7: from script/import_scripts/vbulletin5.rb:320:in `block in import_posts'
         6: from /home/canapin/discourse/script/import_scripts/base.rb:508:in `create_posts'
         5: from /usr/local/rvm/gems/ruby-2.6.5/gems/rack-mini-profiler-2.0.4/lib/patches/db/mysql2.rb:8:in `each'
         4: from /usr/local/rvm/gems/ruby-2.6.5/gems/rack-mini-profiler-2.0.4/lib/patches/db/mysql2.rb:8:in `each'
         3: from /home/canapin/discourse/script/import_scripts/base.rb:509:in `block in create_posts'
         2: from script/import_scripts/vbulletin5.rb:321:in `block (2 levels) in import_posts'
         1: from script/import_scripts/vbulletin5.rb:450:in `preprocess_post_raw'
script/import_scripts/vbulletin5.rb:450:in `gsub': invalid byte sequence in UTF-8 (ArgumentError)

How can I properly identify the post to see what the content looks like in the vbulletin database?

1 Like

Someone suggested ways to use rescue to solve those, so you might go back and find that (I canā€™t remmber if it was in this topic or another one). You could put a put in the rescue to print out the id and/or the text that caused the problem.

You have an encoding problem.

I used this in a similar import (I think youā€™d put it in preprocess_post_raw)

    begin
      win_encoded = raw.force_encoding('utf-8').encode("Windows-1252",
                            invalid: :replace, undef: :replace, replace: ""
                           ).force_encoding('utf-8').scrub
    rescue => e
      puts "\n#{'-'*50}\nWin1252 failed for \n\n#{raw}\n\n"
      win_encoded = ''
    end
3 Likes

Hi,
I modified the the importer and added your script as following:

  def preprocess_post_raw(raw)
    return "" if raw.blank?
    begin
      win_encoded = raw.force_encoding('utf-8').encode("Windows-1252",
                            invalid: :replace, undef: :replace, replace: ""
                           ).force_encoding('utf-8').scrub
    rescue => e
      puts "\n#{'-'*50}\nWin1252 failed for \n\n#{raw}\n\n"
      win_encoded = ''
    end
    # decode HTML entities
    raw = @htmlentities.decode(raw)

    # fix whitespaces
    raw = raw.gsub(/(\\r)?\\n/, "\n")
      .gsub("\\t", "\t")

The invalid byte sequence in UTF-8 happens on this part : raw = raw.gsub(/(\\r)?\\n/, "\n") .gsub("\\t", "\t").

Then I started the importer again. Though it skips already importing data, it took about 6 hours to get to the post that generates an error, and it didnā€™t add the expected information to see the post content. :confounded:Any idea why?

edit:

This is probably the post raw content that leads to the error:

I wonder if Billy is enjoying the parade.

Qwertyuiopasdfghjklzxcvbnm&#55356;&#57174;

Iā€™ll try to modify the importer script to make it skip (for real) the previous 1.4M posts. WIsh me luck. :crossed_fingers:

2 Likes

I modified many other importers to include an import_after setting to allow importing only recent data. You can look at some others to see how I did that.

2 Likes

Hi,
Iā€™ve been able to import almost all my posts! I fixed a few dozens by hand and restarted the import each time it came across a new utf-8 errorā€¦ :sweat_smile:

Now, I need to import the attachments (which are stored in the VBulletin database), but it doesnā€™t work:
When it starts the process, my ram consumption increases a lot in about 10 or 20 seconds and this error happens:

importing attachments...
Failed to create upload: Cannot allocate memory - grep
Fail

My ram:
image

I use a Discourse development version on a Ubuntu 18 subsystem on Windows 10 and I have 16 GB of RAM.

The attachments take 7 GB from the 13 GB vBulletin database.
Note that I use the vbulletin5 importer.

The issue comes from this query:

    SELECT n.parentid nodeid, a.filename, fd.userid, LENGTH(fd.filedata) AS dbsize, filedata, fd.filedataid
      FROM #{DB_PREFIX}attach a
      LEFT JOIN #{DB_PREFIX}filedata fd ON fd.filedataid = a.filedataid
      LEFT JOIN #{DB_PREFIX}node n on n.nodeid = a.nodeid

If I do this query in mysql, my remaining ram is filled within seconds.


(editing my post to remove useless info and questions since Iā€™m figuring out things and providing a workaround)

Workaround:

I added a limit and a offset to the importer SQL query. I imported the attachments by selecting 20000 of them each time:

    uploads = mysql_query <<-SQL
    SELECT n.parentid nodeid, a.filename, fd.userid, LENGTH(fd.filedata) AS dbsize, filedata, fd.filedataid
      FROM #{DB_PREFIX}attach a
      LEFT JOIN #{DB_PREFIX}filedata fd ON fd.filedataid = a.filedataid
      LEFT JOIN #{DB_PREFIX}node n on n.nodeid = a.nodeid
      LIMIT 20000 OFFSET 0
    SQL

I also added an exit at the end of the uploads.each do |upload| loop to prevent the import script to continue doing things after importing my 20000 uploads.

When my 10000 uploads are imported, I edit the script (thanks nano +353 ./scripts/import_scripts/vbulletin5.rb to open the file at the right line) to increase the SQL query OFFSET by 10000, and start the importer againā€¦ And doing so for my 65000 attachments.

During the attachment imports, I faced several errors and warnings including:

  • W, [2020-08-20T12:05:37.402860 #31042] WARN -- : Bad date/time value "0000:00:00 00:00:00": mon out of range
  • Post for 490451 not found (dangling old attachments I guess?)
  • some EXIF data error it seems
  • Fail This one puzzled me and stopped the import script. I checked the first ā€œFailā€ I got and the bulletin attachment was sort of broken (no filename), so I commented the exit instruction to let the importer continue his importing job when he ā€œfailsā€, hoping that wouldnā€™t break anything.
       puts "Fail"
       #exit

I also had a more annoying error that interrupted the import:

1: from /usr/local/rvm/gems/ruby-2.6.5/gems/activerecord-6.0.3.2/lib/active_record/validations.rb:53:in `save!'
/usr/local/rvm/gems/ruby-2.6.5/gems/activerecord-6.0.3.2/lib/active_record/validations.rb:80:in `raise_validation_error':
Validation failed: Body is limited to 32000 characters; you entered 32323. (ActiveRecord::RecordInvalid)

Fortunately, it was a rare error, and I just skipped this attachment until I came across the next identical error. It happened maybe a dozen times on a total of 65000 attachments. I just restarted the import script with a different SQL query offset.

1 Like

Hi,
I noticed that the custom field import_pass was absent for about 400 users of my remaining 27000 users (I cleaned up 154000 inactive users).

Any idea why?

The forum was migrated from phpBB to vBulletin in May. Could it have something to do with that?

I wonā€™t try to ā€œfixā€ this thing and import passwords for these 400 users (unless thereā€™s an easy way to do itā€¦?) and thatā€™s not a big issue, so Iā€™m just being curious more than anything else.

1 Like

Hey guys,
Imported images have the wrong width/height ratio unless I rebake the posts. Iā€™d like to find a way to have the correct ratio (during the import for example) without rebaking.

More verbose description of the issue:

From what I understand, imported posts arenā€™t ā€œbakedā€ when Discourse creates the corresponding post (though the cooked field is generated somehow), so thatā€™s why importing posts is way faster than baking existing Discourse posts.

My issue is that my imported images have the wrong width/height ratio.

Example of the raw Discourse text related to an imported image:

![SH-MUniFrame.JPG|600x800](upload://6Li1nnjbA8zDz6YJ3FqeYHV5zXK.jpeg)

The content of the ā€œcookedā€ field:
<img src="https://d11a6trkgmumsb.cloudfront.net/original/3X/0/3/0379f53ed8221730ccb31807238e9c46e9fe1d37.jpeg" alt="SH-MUniFrame.JPG" data-base62-sha1="6Li1nnjbA8zDz6YJ3FqeYHV5zXK" width="517" height="500" class="d-lazyload">

How the image appears in: the post


Here is the original image: https://d11a6trkgmumsb.cloudfront.net/original/3X/f/7/f73a0ae8594219dd5a1620e59b3c17f9b02b1583.jpeg

The original image size from the vBulletin database is:

select width, height from filedata where filedataid = 76237
+-------+--------+
| width | height |
+-------+--------+
|   600 |    800 |
+-------+--------+

My understanding is that the height attribute is constrained by Discourseā€™s setting which sets a max height of 500px, hence the same value in the <img> height attribute. The <img> width is somewhat modified from 600 to 517 though I canā€™t figure it how and why.

The issue is the same for older images that have 0 in both width and height vBulletin attachment fields. They also have the wrong height/width issue. I donā€™t know if these values are really used during the import.

The issue is resolved by rebaking (rebuild HTML) the post. the image will then be properly resized and the image viewer is added. But I have 1.6M posts and Iā€™d prefer to avoid rebake all of them.

A quick fix would be to use this CSS on my Discourse:

.cooked img:not(.emoji) {
    height: auto;
    width: auto;
}

But it implies that no one will be able to choose an arbitrary size when uploading an image, and there may be collateral effects Iā€™m not aware of.

Why idea how I could have proper images width/height ratio on imported attachments?

I suspect thatā€™s because you didnā€™t let them cook after the import. I cannot imagine a way to solve the problem without rebaking the posts. Perhaps you want to just rebake the posts that are broken rather than all of them?

1 Like

Are they supposed to be cooked automatically over time after the import? Starting from the last or the first created post?

Thatā€™s not a big issue though, and if they are not automatically cooked, Iā€™d probably start a rebake of all posts and be patient, although I admit that I read this post a few days ago and it scared me a little bit: My journey into a massive posts rebake job. I also have questions about that, but Iā€™ll ask them in the proper topic. :blush:

Hmm yes that looks like itā€™s my code. Sorry for that. :confused:

This should be following this pattern:

   batches(BATCH_SIZE) do |offset|
       (Sql code)
        LIMIT #{BATCH_SIZE}
        OFFSET #{offset}
        (Other code)
    end
1 Like

Just raise the max post length site setting prior to the import.

2 Likes