Importers for large forums

Can you convert it to UTF-8?

Why? can we just support UTF8mb4?

I submitted a new pull request that fixed an encoding issue.

https://github.com/discourse/discourse/pull/5003

3 Likes

Hi guys,

I found that the BBCode converting is not working correctly using RegExp. For example:
"[COLOR=red]foo [COLOR=blue]bar[/COLOR] baz[/COLOR]"
will become foo [COLOR=blue]bar baz[/COLOR]

Does it worth writing a small AST parser?

New issue:

discourse@ip-10-0-1-178-app:/var/www/discourse$ IMPORT=1 RAILS_ENV=production ruby script/bulk_import/vbulletin.rb
Loading application...
Starting...
Preloading I18n...
Fixing highest post numbers...
Loading imported group ids...
Loading imported user ids...
Loading imported category ids...
Loading imported topic ids...
Loading imported post ids...
Loading groups indexes...
Loading users indexes...
Loading categories indexes...
Loading topics indexes...
Loading posts indexes...
Importing groups...
Importing users...
1417464 -    106/sec�
/usr/local/lib/ruby/gems/2.4.0/gems/pg-0.20.0/lib/pg/connection.rb:168:in `get_last_result': ERROR:  invalid byte sequence for encoding "UTF8": 0xed 0xa0 0xbd (PG::CharacterNotInRepertoire)
CONTEXT:  COPY user_custom_fields, line 960626
        from /usr/local/lib/ruby/gems/2.4.0/gems/pg-0.20.0/lib/pg/connection.rb:168:in `copy_data'
        from /var/www/discourse/script/bulk_import/base.rb:511:in `create_custom_fields'
        from /var/www/discourse/script/bulk_import/base.rb:193:in `create_users'
        from script/bulk_import/vbulletin.rb:131:in `import_users'
        from script/bulk_import/vbulletin.rb:81:in `execute'
        from /var/www/discourse/script/bulk_import/base.rb:33:in `run'
        from script/bulk_import/vbulletin.rb:494:in `<main>'

I am tired of facing problems.

importing users was >7000/sec for my case, and it increased gradually during the process.
I am using MacBook Pro, the SDD speed is big advantage.

Your forum is big and may used buggy MySQL versions in some occasions before, it may be also crashed, rebooted … several times. In addition, this bulk importation is so beta and need improvements. So, I think you should try modifying some code and running on local too.

P/s: I had to write some more code like custom BBCode parser, attachments, user avatars importing… to make my import perfect.

4 Likes

I’d like to report that this can be used to correct problems where the post counts of users are off (e.g. there are negative post counts). It’s pretty fast, too! :smiley:

(I could imagine a Sidekiq task that runs this once a week being a good idea, actually…)

4 Likes

Looking though the code…

https://github.com/discourse/discourse/blob/4623b46b0ba7f18bf83fdc25c25cab26327444b6/lib/tasks/import.rake#L262

…it looks like this will nuke the read stats :frowning:

Am I overlooking something?

Waiting for XF1 support.
I got 9M post forum.

Want it sooner rather than later?

You can sponsor the development of the XenForo 1 bulk importer by subscribing for 1-year of hosting (starting from the Business plan) :wink:

8 Likes