Importers for large forums

How much ram do you have? You’ll need to have enough ram to hold the whole table. My guess is that you’re starting to swap and that’s slowing your down. Is swap on a hard drive or ssd?

Well, Does Intel 16 Core Xeon CPU @ 2.30GHz w/ 64GB of RAM is not enough?

1 Like

Oh. Darn. So much for that explanation. :exploding_head:

Can it be “row by row”? Instead of “table by table”? And starts from where it ended (after last imported ID)?
This will be very helpful for large forums.

Then it won’t be a bulk importer, would it? :wink:

1 Like

Well, can we call it a chunk importer? :roll_eyes:

That’s what the regular importer does. Perhaps that’s what you want.

What I mean is when I want to stop the bulk importer and rerun it (for speed rate issue), it should start at the last row, not from the first row.

Hi guys, is there any need of mapping old forums (categories) into new configurable categories and tags? We can merge or split old categories into a new structure.

My idea is having .yml file that contains new categories and their old categories’ ids, including tags somehow.

New issue:

discourse@ip-10-0-1-178-app:/var/www/discourse$ IMPORT=1 RAILS_ENV=production ruby script/bulk_import/vbulletin.rb
Loading application...
Starting...
Preloading I18n...
Fixing highest post numbers...
Loading imported group ids...
Loading imported user ids...
Loading imported category ids...
Loading imported topic ids...
Loading imported post ids...
Loading groups indexes...
Loading users indexes...
Loading categories indexes...
Loading topics indexes...
Loading posts indexes...
Importing groups...
Importing users...
1270000 -    119/sec
        /var/www/discourse/script/bulk_import/base.rb:521:in `blank?': invalid byte sequence in UTF-8 (ArgumentError)
        from /var/www/discourse/script/bulk_import/base.rb:521:in `fix_name'
        from /var/www/discourse/script/bulk_import/base.rb:234:in `process_user'
        from /var/www/discourse/script/bulk_import/base.rb:486:in `block (2 levels) in create_records'
        from /usr/local/lib/ruby/gems/2.4.0/gems/rack-mini-profiler-0.10.5/lib/patches/db/mysql2.rb:6:in `each'
        from /usr/local/lib/ruby/gems/2.4.0/gems/rack-mini-profiler-0.10.5/lib/patches/db/mysql2.rb:6:in `each'
        from /var/www/discourse/script/bulk_import/base.rb:483:in `block in create_records'
        from /usr/local/lib/ruby/gems/2.4.0/gems/pg-0.20.0/lib/pg/connection.rb:160:in `copy_data'
        from /var/www/discourse/script/bulk_import/base.rb:482:in `create_records'
        from /var/www/discourse/script/bulk_import/base.rb:191:in `create_users'
        from script/bulk_import/vbulletin.rb:131:in `import_users'
        from script/bulk_import/vbulletin.rb:81:in `execute'
        from /var/www/discourse/script/bulk_import/base.rb:33:in `run'
        from script/bulk_import/vbulletin.rb:494:in `<main>'

My forum encoding is “UTF8mb4

Can you convert it to UTF-8?

Why? can we just support UTF8mb4?

I submitted a new pull request that fixed an encoding issue.

https://github.com/discourse/discourse/pull/5003

3 Likes

Hi guys,

I found that the BBCode converting is not working correctly using RegExp. For example:
"[COLOR=red]foo [COLOR=blue]bar[/COLOR] baz[/COLOR]"
will become foo [COLOR=blue]bar baz[/COLOR]

Does it worth writing a small AST parser?

New issue:

discourse@ip-10-0-1-178-app:/var/www/discourse$ IMPORT=1 RAILS_ENV=production ruby script/bulk_import/vbulletin.rb
Loading application...
Starting...
Preloading I18n...
Fixing highest post numbers...
Loading imported group ids...
Loading imported user ids...
Loading imported category ids...
Loading imported topic ids...
Loading imported post ids...
Loading groups indexes...
Loading users indexes...
Loading categories indexes...
Loading topics indexes...
Loading posts indexes...
Importing groups...
Importing users...
1417464 -    106/sec�
/usr/local/lib/ruby/gems/2.4.0/gems/pg-0.20.0/lib/pg/connection.rb:168:in `get_last_result': ERROR:  invalid byte sequence for encoding "UTF8": 0xed 0xa0 0xbd (PG::CharacterNotInRepertoire)
CONTEXT:  COPY user_custom_fields, line 960626
        from /usr/local/lib/ruby/gems/2.4.0/gems/pg-0.20.0/lib/pg/connection.rb:168:in `copy_data'
        from /var/www/discourse/script/bulk_import/base.rb:511:in `create_custom_fields'
        from /var/www/discourse/script/bulk_import/base.rb:193:in `create_users'
        from script/bulk_import/vbulletin.rb:131:in `import_users'
        from script/bulk_import/vbulletin.rb:81:in `execute'
        from /var/www/discourse/script/bulk_import/base.rb:33:in `run'
        from script/bulk_import/vbulletin.rb:494:in `<main>'

I am tired of facing problems.

importing users was >7000/sec for my case, and it increased gradually during the process.
I am using MacBook Pro, the SDD speed is big advantage.

Your forum is big and may used buggy MySQL versions in some occasions before, it may be also crashed, rebooted … several times. In addition, this bulk importation is so beta and need improvements. So, I think you should try modifying some code and running on local too.

P/s: I had to write some more code like custom BBCode parser, attachments, user avatars importing… to make my import perfect.

4 Likes

I’d like to report that this can be used to correct problems where the post counts of users are off (e.g. there are negative post counts). It’s pretty fast, too! :smiley:

(I could imagine a Sidekiq task that runs this once a week being a good idea, actually…)

4 Likes

Looking though the code…

https://github.com/discourse/discourse/blob/4623b46b0ba7f18bf83fdc25c25cab26327444b6/lib/tasks/import.rake#L262

…it looks like this will nuke the read stats :frowning:

Am I overlooking something?

Waiting for XF1 support.
I got 9M post forum.

Want it sooner rather than later?

You can sponsor the development of the XenForo 1 bulk importer by subscribing for 1-year of hosting (starting from the Business plan) :wink:

8 Likes