Importers for large forums

pfaffman · July 26, 2017, 2:19pm

Can you convert it to UTF-8?

mtawil · July 26, 2017, 3:35pm

Why? can we just support UTF8mb4?

mtawil · July 26, 2017, 11:14pm

I submitted a new pull request that fixed an encoding issue.

https://github.com/discourse/discourse/pull/5003

quangbuule · July 27, 2017, 3:42pm

Hi guys,

I found that the BBCode converting is not working correctly using RegExp. For example:
"[COLOR=red]foo [COLOR=blue]bar[/COLOR] baz[/COLOR]"
will become foo [COLOR=blue]bar baz[/COLOR]

Does it worth writing a small AST parser?

mtawil · August 8, 2017, 11:45am

New issue:

discourse@ip-10-0-1-178-app:/var/www/discourse$ IMPORT=1 RAILS_ENV=production ruby script/bulk_import/vbulletin.rb
Loading application...
Starting...
Preloading I18n...
Fixing highest post numbers...
Loading imported group ids...
Loading imported user ids...
Loading imported category ids...
Loading imported topic ids...
Loading imported post ids...
Loading groups indexes...
Loading users indexes...
Loading categories indexes...
Loading topics indexes...
Loading posts indexes...
Importing groups...
Importing users...
1417464 -    106/sec�
/usr/local/lib/ruby/gems/2.4.0/gems/pg-0.20.0/lib/pg/connection.rb:168:in `get_last_result': ERROR:  invalid byte sequence for encoding "UTF8": 0xed 0xa0 0xbd (PG::CharacterNotInRepertoire)
CONTEXT:  COPY user_custom_fields, line 960626
        from /usr/local/lib/ruby/gems/2.4.0/gems/pg-0.20.0/lib/pg/connection.rb:168:in `copy_data'
        from /var/www/discourse/script/bulk_import/base.rb:511:in `create_custom_fields'
        from /var/www/discourse/script/bulk_import/base.rb:193:in `create_users'
        from script/bulk_import/vbulletin.rb:131:in `import_users'
        from script/bulk_import/vbulletin.rb:81:in `execute'
        from /var/www/discourse/script/bulk_import/base.rb:33:in `run'
        from script/bulk_import/vbulletin.rb:494:in `<main>'

I am tired of facing problems.

quangbuule · August 11, 2017, 4:52pm

importing users was >7000/sec for my case, and it increased gradually during the process.
I am using MacBook Pro, the SDD speed is big advantage.

Your forum is big and may used buggy MySQL versions in some occasions before, it may be also crashed, rebooted … several times. In addition, this bulk importation is so beta and need improvements. So, I think you should try modifying some code and running on local too.

P/s: I had to write some more code like custom BBCode parser, attachments, user avatars importing… to make my import perfect.

fefrei · October 12, 2017, 11:23am

I’d like to report that this can be used to correct problems where the post counts of users are off (e.g. there are negative post counts). It’s pretty fast, too!

(I could imagine a Sidekiq task that runs this once a week being a good idea, actually…)

fefrei · October 24, 2017, 1:34pm

Looking though the code…

github.com

discourse/discourse/blob/4623b46b0ba7f18bf83fdc25c25cab26327444b6/lib/tasks/import.rake#L262


      
               AND NOT COALESCE(p.hidden, 't')
               AND p.post_type = 1
               AND t.deleted_at IS NULL
               AND COALESCE(t.visible, 't')
               AND t.archetype <> 'private_message'
               AND p.user_id > 0
          GROUP BY p.user_id
          )
          UPDATE user_stats
             SET post_count = X.posts
               , posts_read_count = X.posts
               , time_read = X.posts * 5
               , topic_count = X.topics
               , topics_entered = X.topics
               , first_post_created_at = X.min_created_at
               , days_visited = X.days
               , topic_reply_count = X.topic_replies
            FROM X
           WHERE user_stats.user_id = X.user_id
             AND (post_count <> X.posts
               OR posts_read_count <> X.posts

…it looks like this will nuke the read stats

Am I overlooking something?

RoldanLT · May 5, 2018, 8:58am

Waiting for XF1 support.
I got 9M post forum.

zogstrip · May 5, 2018, 9:01am

Want it sooner rather than later?

You can sponsor the development of the XenForo 1 bulk importer by subscribing for 1-year of hosting (starting from the Business plan)

Topic		Replies	Views
Migrate a Vanilla forum to Discourse Sysadmins how-to	44	15856	January 30, 2023
Migrate a phpBB3 forum to Discourse Migrating to Discourse how-to	458	95706	March 13, 2025
Migrate a XenForo forum to Discourse Sysadmins how-to	96	19828	February 25, 2025
Migrating vBulletin 5 database - Import script errors Migration vbulletin5	46	2206	March 8, 2023
[Paid] Need a Vanilla 2 Import tool Marketplace	67	10807	May 2, 2015

Importers for large forums

Related topics