phpBB 3 Importer (old)

Kinda lengthy, but here we go:

125836 / 185464 ( 67.8%)Exception while creating post 136113. Skipping.
Script Timed Out
/var/www/discourse/lib/pretty_text.rb:152:in `block in markdown'
/var/www/discourse/lib/pretty_text.rb:299:in `block in protect'
/var/www/discourse/lib/pretty_text.rb:297:in `synchronize'
/var/www/discourse/lib/pretty_text.rb:297:in `protect'
/var/www/discourse/lib/pretty_text.rb:133:in `markdown'
/var/www/discourse/lib/pretty_text.rb:172:in `cook'
/var/www/discourse/app/models/post_analyzer.rb:12:in `cook'
/var/www/discourse/app/models/post.rb:157:in `cook'
/var/www/discourse/lib/post_creator.rb:116:in `before_create_tasks'
/var/www/discourse/app/models/post.rb:369:in `block in <class:Post>'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:438:in `instance_exec'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:438:in `block in make_lambda'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:160:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:160:in `block in halting'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:86:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:86:in `run_callbacks'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/callbacks.rb:306:in `_create_record'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/timestamp.rb:57:in `_create_record'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/persistence.rb:482:in `create_or_update'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/callbacks.rb:302:in `block in create_or_update'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:113:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:113:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:166:in `block in halting'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:166:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:166:in `block in halting'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:166:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:166:in `block in halting'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:166:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:166:in `block in halting'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:166:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:166:in `block in halting'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:166:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:166:in `block in halting'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:215:in `block in halting_and_conditional'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:86:in `call'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activesupport-4.1.5/lib/active_support/callbacks.rb:86:in `run_callbacks'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/callbacks.rb:302:in `create_or_update'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/persistence.rb:103:in `save'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/validations.rb:51:in `save'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/attribute_methods/dirty.rb:21:in `save'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/transactions.rb:268:in `block (2 levels) in save'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/transactions.rb:329:in `block in with_transaction_returning_status'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:199:in `transaction'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/transactions.rb:208:in `transaction'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/transactions.rb:326:in `with_transaction_returning_status'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/transactions.rb:268:in `block in save'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/transactions.rb:283:in `rollback_active_record_state!'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/transactions.rb:267:in `save'
/var/www/discourse/lib/post_creator.rb:244:in `save_post'
/var/www/discourse/lib/post_creator.rb:74:in `block in create'
/var/www/discourse/lib/distributed_mutex.rb:21:in `synchronize'
/var/www/discourse/lib/distributed_mutex.rb:5:in `synchronize'
/var/www/discourse/lib/post_creator.rb:136:in `block in transaction'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `block in transaction'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:209:in `within_new_transaction'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `transaction'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.5/lib/active_record/transactions.rb:208:in `transaction'
/var/www/discourse/lib/post_creator.rb:130:in `transaction'
/var/www/discourse/lib/post_creator.rb:69:in `create'
/var/www/discourse/script/import_scripts/base.rb:436:in `create_post'
/var/www/discourse/script/import_scripts/base.rb:390:in `block in create_posts'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/rack-mini-profiler-0.9.1/lib/patches/sql_patches.rb:30:in `each'
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/rack-mini-profiler-0.9.1/lib/patches/sql_patches.rb:30:in `each'
/var/www/discourse/script/import_scripts/base.rb:377:in `create_posts'
script/import_scripts/phpbb3.rb:107:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:556:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:555:in `loop'
/var/www/discourse/script/import_scripts/base.rb:555:in `batches'
script/import_scripts/phpbb3.rb:87:in `import_posts'
script/import_scripts/phpbb3.rb:27:in `execute'
/var/www/discourse/script/import_scripts/base.rb:71:in `perform'
script/import_scripts/phpbb3.rb:295:in `<main>'

Edit: Holy moly - this script is one memory hog - I wonder why the first test run went so fine. Here’s my top output:

top - 23:44:02 up 1 day,  1:08,  2 users,  load average: 1,36, 1,52, 1,71
Tasks: 163 total,   2 running, 161 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,3 us,  5,3 sy,  0,0 ni, 69,8 id, 24,1 wa,  0,0 hi,  0,6 si,  0,0 st
KiB Mem:   4054212 total,  3892700 used,   161512 free,      228 buffers
KiB Swap: 22014968 total,  5873392 used, 16141576 free,   212740 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
15546 joerg     20   0 6192m 2,7g 2228 R  12,3 69,3 436:42.32 ruby
40356 statd     20   0  390m 191m 185m S   0,0  4,8   0:03.71 postmaster
 9308 joerg     20   0 1579m 139m 2116 S   0,3  3,5  53:32.72 ruby
 9630 joerg     20   0  480m 109m 2120 S   1,7  2,8   5:58.89 ruby
35719 joerg     20   0  474m 107m 2268 S   0,0  2,7   0:53.42 ruby
 9380 joerg     20   0  517m  96m 2216 S   0,0  2,4   6:51.52 ruby
 9279 joerg     20   0  462m  55m 1284 S   0,0  1,4   4:06.53 ruby

We’re looking at a 4GB VM - with 20G of swap active - and kswapd one of the regular visitors in the top output: CPU times: 437:15.71 ruby and 87:27.11 kswapd0. Is there any way we can reduce the memory impact here?

You may have found another bug (endless loop or infinite recursion) in the markdown parser. Can you pull the text of post 136133 from the phpBB database?

Sure - and sorry 'bout my excessive bug hunting
 :open_mouth: Here’s post 136113 (not 136133, but anyway). We had a nested quote here - and smilies inside the outer quote.

 [quote=&quot;inferno&quot;:a1prf4jp][quote=&quot;OFNeo&quot;:a1prf4jp]Ich bekam heute
- Qnap TS-201 (ein super NAS)
- 2 x 640 GB WD-Platten.

Alles lĂ€uft im Raid-1 fĂŒr meinen Mac <!-- s:) --><img src="{SMILIES_PATH}/smile.gif" alt=":)" title="Smile" /><!-- s:) -->[/quote:a1prf4jp]

könntest Du da mal irgendwann wenn Du zeit hast Deine Erfahrungen damit posten? Mich wĂŒrde die Performance und das Handling interessieren.  <!-- s:) --><img src="{SMILIES_PATH}/smile.gif" alt=":)" title="Smile" /><!-- s:) -->[/quote:a1prf4jp]

Erfahrung <!-- s:) --><img src="{SMILIES_PATH}/smile.gif" alt=":)" title="Smile" /><!-- s:) --> Das ist mein zweites TS-201. Das Erste lÀuft mit zwei 750er Platten von WD. Sehr zufrieden. Firmware wurde verÀndert, so das auch MySql lÀuft. Also spart man sich das TS-209. Handling ist selbsterklÀrend, alles wird per Webinterface geÀndert.
Performance, bin als normal-User damit zufrieden. Lediglich die Wiedergabe der bereitgestellten Lieder ĂŒber iTunes ist sehr langsam (bei sehr großer Mp3-Sammlung).

Edit: Here’s another post that just threw up. Err, an exception:

&gt; der aktuelle mini hat 2x sata (HD + optisches LW)

Cool <!-- s:-) --><img src="{SMILIES_PATH}/smile.gif" alt=":-)" title="Smile" /><!-- s:-) --> der Laufwerksanschluß ist aber bestimmt etwas kleiner wie sata Standard ?
Wie bei Notebook Laufwerken halt ĂŒblich...

Mit diesem Kabel kann ich schonmal vom internen Festplattenaschluß direkt an einen 
sata Wechselrahmen.

<!-- m --><a class="postlink" href="http://tbn2.google.com/images?q=tbn:X-XOzRI58RBzAM:http://lib.store.yahoo.net/lib/cooldrives/micro-sata-male-female-x1.jpg">http://tbn2.google.com/images?q=tbn:X-X ... ale-x1.jpg</a><!-- m -->

Das ganze könnte man auch als Mac mini Umbau zum Xserve fĂŒr arme umtaufen <!-- s:-) --><img src="{SMILIES_PATH}/smile.gif" alt=":-)" title="Smile" /><!-- s:-) -->

Munter bleiben, Rossi

So probably the nested quotes aren’t the culprit - besides the fact that both contain HTML entities, I don’t see anything really special there that they both share. :neutral_face:

Hmmm, I can’t reproduce the exception with either post. Both the PhpBB-specific pre-parser and the Discourse markdown parser run through just fine.

Right now, I can only assume that due to the script’s excessive memory consumption, your server is so overloaded with swapping memory that the JavaScript portion of the markdown parser actually times out.

1 Like

Just wondering
 How many users, topics and posts are you importing?
I haven’t looked at the memory consumption yet, but this seems quite high.

I’m trying to import 7100 users, 185,000 posts in 17,000 topics. I will restart the process later and see how it works today. Or if any of the above exceptions happen again with the same id
 :wink:

Hm
 on the road again. Now I’ve hit my first exception with

17711 / 185464 (  9.5%)Exception while creating post 21182. Skipping.
Cannot call method 'slice' of undefined
at convert_tree_to_html (/var/www/discourse/vendor/assets/javascripts/better_markdown.js:495:23)
at convert_tree_to_html (/var/www/discourse/vendor/assets/javascripts/better_markdown.js:621:21)
at convert_tree_to_html (/var/www/discourse/vendor/assets/javascripts/better_markdown.js:621:21)
at toHTMLTree (/var/www/discourse/vendor/assets/javascripts/better_markdown.js:419:16)
at Discourse.Dialect.cook (/var/www/discourse/app/assets/javascripts/discourse/dialects/dialect.js:194:23)
at makeHtml (/var/www/discourse/app/assets/javascripts/discourse/lib/markdown.js:230:34)
at <eval>:1:44
/var/www/discourse/lib/pretty_text.rb:152:in `block in markdown'
/var/www/discourse/lib/pretty_text.rb:299:in `block in protect'
/var/www/discourse/lib/pretty_text.rb:297:in `synchronize'
/var/www/discourse/lib/pretty_text.rb:297:in `protect'
/var/www/discourse/lib/pretty_text.rb:133:in `markdown'
/var/www/discourse/lib/pretty_text.rb:172:in `cook'
/var/www/discourse/app/models/post_analyzer.rb:12:in `cook'
/var/www/discourse/app/models/post.rb:157:in `cook'
/var/www/discourse/lib/post_creator.rb:116:in `before_create_tasks'
/var/www/discourse/app/models/post.rb:372:in `block in <class:Post>'


that post is basically:

Text
[url]hier isses[/url]
Text
<!-- s:oops: --><img src="{SMILIES_PATH}/blushing.gif" alt=":oops:" title="Oops" /><!-- s:oops: --> 
Text

Text
[img:af987]http&#58;//img19&#46;imageshack&#46;us/img19/3920/bild23ni&#46;png[/img:af987]

some more text


and
 <!-- s:!: --><img src="{SMILIES_PATH}/excemation2.gif" alt=":!:" title="Exclamation" /><!-- s:!: -->


the empty url-BBcode is stupid, sure - could that cause an exception? If so, one might be tempted to ignore the error, the “skipping” is the only part that could be improved in that case. As a side note, the first exception was the moment when the machine started to use swap memory
 not excessively, but anyway
 :frowning:

1 Like

Yeah, there’s - unsurprisingly - yet another bug in the markdown parser.

I’m beginning to wonder if putting more energy into fixing this thing is even worth it. @sam, any idea when we’re getting that new CommonMark parser?

Anyway, I’ll have a look at this.

1 Like

Thanks, Jens! Here’s an example that’s a bit more legit (but we should not rely on the fact that there’s something feasible between the url-Tags):

[quote=&quot;gaba&quot;:2143e]
text
: [url]forum.example.com[/url][/quote:2143e]
more text... <!-- s;) --><img src="{SMILIES_PATH}/wink.gif" alt=";)" title="" /><!-- s;) -->

To my second problem, performance and memory consumption: the import script is eating memory like a black hole eats matter: the import is 50% done, and swap already filled to 1,375 MB - from zero. The import speed has fundamentally deteriorated, and I guess that I will see timeout exceptions in about 10-20% through the rest of my data. Currently, the only chance I see to get the whole import through is to break halfway through the process, edit the import script and resume from a certain post id
 are there any other options that I miss? Any ideas where the memory leak could be - in phpbb.rb or in base.rb? :no_mouth:

We’re seeing the same things in our importer (actually, that’s why we haven’t been able to release it yet). The memory leak seems to be both in the import script and in Sidekiq.

What we are doing to work around this is making it restartable, so it continues wherever it left off.

Whenever it runs out of memory we clear Redis and restart Sidekiq and then the import process. The disadvantage is that some things don’t get downloaded.

Last week we’ve done a 400.000+ posts import and it ran out of memory 27 times before it finally finished.

All right - do you have any pointers where to look here? I basically would’ve commented out the parts already done (like user import), and modify the query which pulls the original contents to start at a given id. Do you have any ideas more sophisticated than this?

Holy cow! Looks like I’m up to something here
 :frowning: Thanks for the info! Right now, I’m starting one final attempt by using a different ruby version hoping that the GC is improved - but I don’t have my hopes up very high


What we do is that we store a discourse_id in the original database for every post, topic and user. We create a new column in all those tables (in the old forum database) and set it to zero. If importing a certain post, topic or user succeeds we store the Discourse id (this is also handy for feeding into a SEO redirection script which we put in place instead of the old forum). If it fails we store -1. All our queries have a filter WHERE discourse_id = 0

This makes the script continue where it left off.

3 Likes

The questions we should ask are:

  • What is sidekiq doing during the import?
  • Can we disable some (or even all) of those tasks or at least optimize the way they are executed?
  • Can we find and solve the memory leaks?
  • Where, if any, are the memory leaks in the import script?

The standard import scripts are already pretty restart-able already, if it comes across an issue you can pretty much run it again and it will pick up where it left off.

My largest final import output 424.7K posts over 39.6K topics and 10.2K users.

Creating the Digital Ocean instance at 2GB with a 2GB swap and then bumping it to a 16GB instance for the import only required 3 restarts.
I monitored the progress (in a spreadsheet) and when the post imports per second decreased and started to tail off - I just Ctrl+C the import and then started it again.
The import process slows because of the memory hole.

All the while I had at least 25 sidekiq’s running - basically enough to keep the CPU at ~80% clearing the backlog of stuff at the same time as the import.

2 Likes

FYI: this was a phpBB 2 importer I wrote based on the phpBB 3 one by the awesome @neil.

Yeah the existing scripts are restartable. I would love to solve the memory growth issue, but don’t have time right now. :blush:

The phpbb3 script I made solved the problems I found in the site I was using, but I’m sure there are other phpbb things that I didn’t encounter.

3 Likes

Thanks for all the pointers - I can live with some restarts of course.

Do you have yet an idea where this problem comes from? Somehow, I tend to be suspicious towards all the nice Ruby blocks, yields and implicits used, but unfortunately, I’m no Ruby expert


The tags will look like this:

[attachment=0]IMG_2878.JPG[/attachment]

Anyone looking at the embedded uploads issue?

Sorry to be a pain. If there is any way I can help, let me know. I was daft enough to announce that I was: “cutting our site over to discourse and all will be wonderful!” (two months ago).

I’ve looked at that code and am unsure where to start. I’m having trouble even seeing where normal uploads are pulled across during the import. There’s a create_upload method but it does not seem to get called.

Well, you can’t find it, because it’s not (yet) implemented for phpBB3.