Importers for large forums


(Jay Pfaffman) #41

Can you convert it to UTF-8?


#42

Why? can we just support UTF8mb4?


#43

I submitted a new pull request that fixed an encoding issue.


(Quang-Buu Le) #44

Hi guys,

I found that the BBCode converting is not working correctly using RegExp. For example:
"[COLOR=red]foo [COLOR=blue]bar[/COLOR] baz[/COLOR]"
will become foo [COLOR=blue]bar baz[/COLOR]

Does it worth writing a small AST parser?


#46

New issue:

discourse@ip-10-0-1-178-app:/var/www/discourse$ IMPORT=1 RAILS_ENV=production ruby script/bulk_import/vbulletin.rb
Loading application...
Starting...
Preloading I18n...
Fixing highest post numbers...
Loading imported group ids...
Loading imported user ids...
Loading imported category ids...
Loading imported topic ids...
Loading imported post ids...
Loading groups indexes...
Loading users indexes...
Loading categories indexes...
Loading topics indexes...
Loading posts indexes...
Importing groups...
Importing users...
1417464 -    106/sec�
/usr/local/lib/ruby/gems/2.4.0/gems/pg-0.20.0/lib/pg/connection.rb:168:in `get_last_result': ERROR:  invalid byte sequence for encoding "UTF8": 0xed 0xa0 0xbd (PG::CharacterNotInRepertoire)
CONTEXT:  COPY user_custom_fields, line 960626
        from /usr/local/lib/ruby/gems/2.4.0/gems/pg-0.20.0/lib/pg/connection.rb:168:in `copy_data'
        from /var/www/discourse/script/bulk_import/base.rb:511:in `create_custom_fields'
        from /var/www/discourse/script/bulk_import/base.rb:193:in `create_users'
        from script/bulk_import/vbulletin.rb:131:in `import_users'
        from script/bulk_import/vbulletin.rb:81:in `execute'
        from /var/www/discourse/script/bulk_import/base.rb:33:in `run'
        from script/bulk_import/vbulletin.rb:494:in `<main>'

I am tired of facing problems.


(Quang-Buu Le) #47

importing users was >7000/sec for my case, and it increased gradually during the process.
I am using MacBook Pro, the SDD speed is big advantage.

Your forum is big and may used buggy MySQL versions in some occasions before, it may be also crashed, rebooted … several times. In addition, this bulk importation is so beta and need improvements. So, I think you should try modifying some code and running on local too.

P/s: I had to write some more code like custom BBCode parser, attachments, user avatars importing… to make my import perfect.


(Felix Freiberger) #48

I’d like to report that this can be used to correct problems where the post counts of users are off (e.g. there are negative post counts). It’s pretty fast, too! :smiley:

(I could imagine a Sidekiq task that runs this once a week being a good idea, actually…)


Wrong post count in profile
(Felix Freiberger) #49

Looking though the code…

…it looks like this will nuke the read stats :frowning:

Am I overlooking something?