phpBB 3 Importer goes slow

Wanna ask about phpBB 3 importer. It works fine, but I have a lot of posts on my forum.

And after about 60% it starts to go very slow. And it goes slower and slower. I tried this converter on test server before, it worked 4 days and I got to 74.8%. Over last night it was 0.3% progress so I aborted it.

My questions:

  1. Will it start over and duplicate already converted posts if I’ll abort it now and run again?
  2. What’s the best way to split this operation in half? I can split my phpBB database in half, but will importer work in two runs like this, and not messup first imported part when I’ll start to import second one?

I think this questions adressed to @neil. Thanks.

The importer will skip posts that it already imported, so feel free to cancel it and restart it. I don’t know what causes it to gradually slow to a crawl. I added as much batch processing as I could find, but maybe I missed something…

2 Likes

I think it bumps into memory limit or something, RegExp is a slow thing. After I stopped importer with Ctrl+C twice and tried to run it again, it got stuck right at start. So I’ve rebooted server and re-run it - worked great. Thanks for awesome convertor, Neil.

Wanna share an experience. I launched it with Discourse multisite config. And it was imported on first DB. My concern was that it would import on both DBs - didn’t happened.

I noticed one tiny minus so far. I have a topic for YouTube videos about cats in hats, it have 1000+ posts. After 700 post or so videos are not shows as a player-box, it shows like links. It becomes a player if you edit and save message. It’s not a big deal thought, what important is that links are there.

Maybe if you ran a rebake?

1 Like

One idea: If it’s inserting into a table that is indexed, it probably has to hold the index in memory while inserting. Once the index is larger than available RAM, it starts paging and gets super slow. If that’s the case, try creating the table first and then indexing after the records are inserted.

Didn’t knew I can do that. I’ll try, thanks.

Lol. rake posts:rebake acts same way - it starts to slow down after like 50%. So I dropped this idea.

I wanna report an issue. I can’t believe I didn’t saw it right at begining… :laughing: Before importing my DB to Discourse I structured phpBB in Discourse style - 1 level of sub-categories max. But some sub-categories had same names. Such as:

World of Warcraft > Classes
Tera > Classes

And those sub-categories totally messed up together. I have topics from Tera > Classes in WoW > Classes category now, topics from WoW > Classes in Tera > Classes, etc. So, users beware! Names of ALL your phpBB forums has to be unique before importing.

Hmm @neil we should fix this for future phpbb imports – can we embed the ID in the categories so they don’t get inappropriately mapped to duplicate names?

I’m pretty sure we do have phpbb’s id for the category, so it shouldn’t be doing that. I’ll have a look.

1 Like

Yes, the importer has the ID from phpBB and it also finds the correct category, but it is currently using the category name:
https://github.com/discourse/discourse/blob/519c875d87a83d4fc9332e1a56991399a557d8b5/script/import_scripts/phpbb3.rb#L159

This problem seems to exist for all importers since they all use the category name instead of its ID.

There’s some kind of workaround in TopicCreator which uses either the ID or the the name to find the category…
https://github.com/discourse/discourse/blob/17d07a8b9a3db9d963f9d7d5bd73504984ef3188/lib/topic_creator.rb#L94-L106

Anyway, I submitted a PR which should fix this.

https://github.com/discourse/discourse/pull/3270

4 Likes

Thanks @gerhard, I’ll have a look! EDIT: merged

1 Like