phpBB 3 Importer (old)

Could be something wrong with the import script, or the phpbb tables don’t have a post with id 503.

1 Like

The forum was initially on a different forum software. So it may be the case that the phpBB has missing posts (somehow).

Post 503 does not show up in the phpBB forum.

This works:
http://www.pedaltrout.com/forum/viewtopic.php?p=67352

This does not:
http://www.pedaltrout.com/forum/viewtopic.php?p=503

@neil When using the bbcode-to-md gem the content wrapped in a link appear to be broken.

Moved to own topic: phpBB import bug bbcode-to-md gem - content wrapped link appears broken

2 Likes

Has somebody verified that this script should work in the current Docker version? I’m having great difficulties in getting this to work. I have now done 3 different clean installs with 3 different attack vectors, including all in this thread.

I have installed Discourse exactly as the Docker guide says. For starters the /var/www/discourse folder doesn’t exist. Is the first post of this thread an up-to-date guide? I have read through the whole thread and no reply indicates the problems I’m having.

I was at last (after fondling around installing applications all over the place) to run the phpbb3.rb script. I also had to locate the /var/www/discourse folder inside the docker folder (see below). There was just a huge stacktrace about when running the script:

Error connecting to Redis on localhost:6379 (ECONNREFUSED) subscribe failed, reconnecting in 1 second. Call stack [“/var/lib/docker/aufs/mnt/40ce60678aa0d88d128d0d8849a790a52a44236cdca114453ecdbd5fe5ab8e16/var/www/discourse/vendor/bundle/ruby/1.9.1/gems/redis-3.0.7/lib/redis/client.rb:290:in `rescue in establish_connection’”

I don’t know where the Docker runs Redis, and I think I might have missed something about the discourse user, as there is no such user and the files are in a rather strange place (meaning they are not in the place the readmes say).

You need to run this inside the Docker container. Run ./launcher app ssh to log into the container. Then you can run the script.

Thank you alot Gerhard! And of course everyone involved in making of this script!

The script is currently creating the topics and posts and everything seems fine so far. It was just that I was new with Docker, so I wasn’t familiar about the idea of working inside a container.

My contribution is this:

  1. Be sure to run the script inside the container with cd /var/discourse/ && ./launcher ssh app
  2. Check your table prefix in the converter script: replace phpbb_ with your own phpBB3 table prefix. Be cautious about renaming, as there are things that don’t co-operate with this simple solution.
  3. If you want to connect to a remote mySQL server, ensure that it’s accessible externally (you can check if it is with nmap <ip-address>, guide to opening it up)

And I also made the Docker-compatible script into a oneliner for an easy fire-and-forget:

apt-get update && apt-get install libmysqlclient-dev -y && git clone GitHub - nlalonde/ruby-bbcode-to-md: Convert BBCode to Markdown. /tmp/ruby-bbcode-to-md && cd /tmp/ruby-bbcode-to-md && gem build ruby-bbcode-to-md.gemspec && gem install ruby-bbcode-to-md-0.0.13.gem && su - discourse && echo “gem ‘mysql2’” >> /var/www/discourse/Gemfile && echo “gem ‘ruby-bbcode-to-md’, path: ‘/tmp/ruby-bbcode-to-md’” >> /var/www/discourse/Gemfile && exit && cd /var/www/discourse && bundle install --no-deployment --path vendor/bundle && su - discourse && cd /var/www/discourse && RAILS_ENV=production bundle exec ruby script/import_scripts/phpbb3.rb bbcode-to-md

P.s. Perhaps the whole migration “framework” could use a Readme, the basic flow as well as these specific migration scripts. Also the punbb3 script should have a lower batch size by default (10 or 100 I think).

4 Likes

Also, @sampsakuronen, make sure your phpBB posts do not use embedded uploads. Those are not supported yet.

You may have a number of posts with just an embed tag and no image.

@neil, any success with the phpbb uploads handling? If there is any way I can help, let me know.

Just started my first experiments wich my ~200k posts phpBB3 installation. Before anyone else tries something as stupid as I did, here’s a piece of advice: if you try to convert something tiny (around 10k posts total), you’re fine. If you want to go for something bigger (like I did), you might want to install a mysql server in your docker container, transfer the database there and work on the database on localhost through a local file socket instead of connecting to port 3306 somewhere. The performance penalty is in the magnitude of 100x (estimated, not measured).

So while my first feeble attempts took a whole night, things look pretty slick now. I ran into one exception, however: one of my posts has a nested smiley in a quote - which threw an exception, and after that, import speed went to a crawling 10/sec. Here’s the approximate posting - I will try to dissect this further:

[size=85:291ba][quote=&quot;user&quot;:291ba]Title <!-- s:D --><img src="{SMILIES_PATH}/laugh.gif" alt=":D" title="Very Happy" /><!-- s:D -->[/quote:291ba][/size:291ba][size=85:291ba]Jaaaaaaaaaaaaaa...more text...[/size:291ba]

Even more text 

Here’s the exception:

Exception while creating post 75879. Skipping.
Object #<Object> has no method 'slice'
at convert_tree_to_html (/var/www/discourse/vendor/assets/javascripts/better_markdown.js:495:23)
at convert_tree_to_html (/var/www/discourse/vendor/assets/javascripts/better_markdown.js:621:21)
at convert_tree_to_html (/var/www/discourse/vendor/assets/javascripts/better_markdown.js:621:21)
at convert_tree_to_html (/var/www/discourse/vendor/assets/javascripts/better_markdown.js:621:21)
at toHTMLTree (/var/www/discourse/vendor/assets/javascripts/better_markdown.js:419:16)
at Discourse.Dialect.cook (/var/www/discourse/app/assets/javascripts/discourse/dialects/dialect.js:181:23)
at makeHtml (/var/www/discourse/app/assets/javascripts/discourse/lib/markdown.js:230:34)
at <eval>:1:44
/var/www/discourse/lib/pretty_text.rb:152:in `block in markdown'
/var/www/discourse/lib/pretty_text.rb:299:in `block in protect'
/var/www/discourse/lib/pretty_text.rb:297:in `synchronize'
/var/www/discourse/lib/pretty_text.rb:297:in `protect'
/var/www/discourse/lib/pretty_text.rb:133:in `markdown'
/var/www/discourse/lib/pretty_text.rb:172:in `cook'
/var/www/discourse/app/models/post_analyzer.rb:12:in `cook'
/var/www/discourse/app/models/post.rb:157:in `cook'
/var/www/discourse/lib/post_creator.rb:116:in `before_create_tasks'
/var/www/discourse/app/models/post.rb:369:in `block in <class:Post>'

And, as another side note: don’t go down the import path with less than 16GB swap if your forums are playing in the same league than mine. The importer script chews up memory like crazy - at least it did in my first runs through the socket connection to the host machine, ending each run after less than 80% with a single “Killed.” note.

I will also try to add something else: an avatar import would be nice, and should be relatively easy to implement.

Yeaaah, that was probably the kernel running out of memory and deciding that killing a user task is a better option than crashing the whole system. :sweat_smile:

Anyway, the converter should probably pre-process these posts and convert them back to mostly text. s/<!-- s(.+?) --><img src="(?>[^"]*)" alt="\1" title="(?>[^"]+)" \/><!-- s\1 -->/\1/g should replace the smiley HTML with the original ASCII smiley string.

True - just killing the ruby process seems clever from a system’s point of view… :wink:

The smiley preprocessing seems to work - the exception, however, looks like something really broke, because the convert_tree_to_html method tried to slice the input and ended calling “slice” from a non-array or -string. If I won’t run into any other oddities, I might simply drop that posting, since it’s merely banter, but maybe someone can try to figure out what went wrong…

/Edit: I see my sidekiq queue is really full (>70k with 86% import done) - has anyone come up with a clever idea how to optimize this?

We need some way for the import to perform ProcessPost/etc jobs synchronously, without messing up stuff like the topic auto-close jobs (which actually need to not run during the import).

2 Likes

Yeah, the Markdown parser seems to be an ongoing source of work. I found a text snippet that reproduces the error… gonna have a look.

1 Like

Keep in mind, on an unrelated note, the long term plan is to move to a CommonMark implementation, probably jgm’s one.

Thanks for the quick fix - I’m currently re-running the import. Today, the import speed is down to 10 posts/sec again - although I access die database locally and even tried to disable image crawling beforehand. :unamused:

That’s pretty fast in our experience :smile:

Yes, our Sidekiq queue gets really full during imports as well.

For any imports I have done I’ve always started additional sidekiq processes to clear the backlog during the import
https://meta.discourse.org/t/start-several-temporary-sidekiq-processes-to-clear-queue-backlog/16710/7?u=deanmarktaylor

2 Likes

If you do that, do keep a close eye on your server’s memory use. When I imported my old SMF2 forum, Sidekiq quickly spun up to consume around 1,8 GB memory and wouldn’t free any of it until the system finally quieted down…

Is there any easy way to figure out what keeps the import script busy? I’m having a run that is chewing almost for 24h in the same import that was finished in one night the last run?! The sidekiq queues are empty, and I’m seeing timeout exceptions on the console…

Timeout exceptions? That should certainly not be happening. Could you show us a stacktrace and the full exception message?