hello folks, I’m trying to understand why the import script from mybb is failing to do its job on many posts. I’m trying to import from a forum that has around 190,000 posts. The import scripts imports some of them but many seem to be missing. The output of the script RAILS_ENV=production ruby mybb.rb contains lines like the ones below:
166417 / 170395 ( 97.7%) [74673 items/min] Parent post 1044397448 doesn't exist. Skipping 186530: PHP: HTTP_HOST vs. SERVER_NAME
or
Parent post 1251298548 doesn't exist. Skipping 188213: $_POST empty
and when I try to count how many of these posts are skipped, I get a considerable number, around 120,000. Quite a lot of them.
$ grep Skipping import.log | wc -l
124115
I can’t figure out why these posts are skipped. What does it mean that a parent post doesn’t exist? Any suggestion on where to look next?
ooops, I noticed that comment as soon as I hit Post on my message here. I guess I skimmed through it because I didn’t think the old mybb forums were imported from phpbb but maybe they were (it’s a 8 years old site). I’m running the import with that query now, looks promising so far. I’ll report once it’s done.
BTW, I believe there is a typo in the query. Line 117 should not end with a ,
Find a few of those skipped posts and look at them in the MyBB database. What do they have in common? Why do the importer’s query not find the first post in the topic? That’s how I would try to debug and fix this problem.
The importer seems to throw an exception and skips some of the original articles. This is what I see in the importer’s log:
119718 / 170395 ( 70.3%) [1507 items/min] Exception while creating post 123257. Skipping.
119719 / 170395 ( 70.3%) [1507 items/min] Parent post 123257 doesn't exist. Skipping 123258: My website just went completely non-respo
119737 / 170395 ( 70.3%) [1507 items/min] Parent post 123257 doesn't exist. Skipping 123276: My website just went completely non-respo
Looks like the importer barfs for some reason at the beginning of importing a thread.
Searching for common threads, I noticed that at least one of these failing to import shares the tid (thread ID) … not sure why (an excerpt of the query below).
I may have spotted a pattern now: no post older than Oct 31 2016 gets imported. I can’t see what the newer posts have different than those before Oct 31
Once I spotted this issue, I have run the importer again but this time reducing the batch size and limiting the query to only the posts with datetime after Oct 31 2016. This completed the import.