Vanilla to Discourse Large Data Import (decreasing speed)

Christian_Suntay · November 5, 2020, 12:44pm

We have 26gb of data dump from discourse
1.3million users
3million topics
21million posts

our problem is we are importing at 500k/min but after a few minutes it decreases up to 2k/min

pfaffman · November 5, 2020, 12:56pm

You’ll need lots of ram. You might look at the bulk importers, but I don’t believe that there is one for vanilla.

Christian_Suntay · November 5, 2020, 1:04pm

hi Jay. We are using c5.4xlarge instance with AWS and at first it is at 500k/min and slows down after a few minutes.

riking · November 5, 2020, 1:05pm

The import script is restartable, but this is unfortunately just normal with the import scripts.

Christian_Suntay · November 5, 2020, 2:05pm

yup when i restart it just skips the data already imported but the same process decreases overtime

Christian_Suntay · November 5, 2020, 2:16pm

thanks for confirming this. total of 31million of data will take a month or so if it keeps on decreasing. Any suggestion for this to improve? or it is just the way it is?

gerhard · November 5, 2020, 2:34pm

You need a CPU with fast single core speed which is quite hard to find in the cloud.

Or give the bulk import script a try. Importers for large forums

There is one for Vanilla: https://github.com/discourse/discourse/blob/master/script/bulk_import/vanilla.rb

Christian_Suntay · November 5, 2020, 2:49pm

we use c5.4xlarge from AWS
vCPU - 16 Memory (GiB) - 32
is this enough or should we upgrade?

sure, will try that bulk import script. Thanks!

gerhard · November 5, 2020, 2:53pm

You will need a CPU from the top of PassMark CPU Benchmarks - Single Thread Performance if you want to run the regular import script as fast as possible. I have no idea what you get on AWS or any other cloud provider with vCPUs.

pfaffman · November 5, 2020, 3:32pm

You want to use the bulk importer.

Christian_Suntay · November 5, 2020, 6:46pm

whenever i tried the bulk import it stops there. Since the traceback stops at category ids
i tried changing the -1 to 0

@last_imported_category_id = imported_category_ids.max || -1
to
@last_imported_category_id = imported_category_ids.max || 0

I even tried to delete the category with -1 id then tried again. no luck

riking · November 12, 2020, 3:34am

If you can hire extra help, contact @pfaffman at https://www.literatecomputing.com/ .

Topic		Replies	Views
Migrating a large forum Support	11	1587	October 14, 2021
Importers for large forums Announcements	50	9310	December 1, 2023
Import from vbulletin to discourse forums Dev	5	2203	August 28, 2018
Migrate a Vanilla forum to Discourse Sysadmins how-to	44	15948	January 30, 2023
Bypass UploadCreator for Import Support	8	423	January 9, 2021

Vanilla to Discourse Large Data Import (decreasing speed)

Related topics