Mybb import skips importing posts

hello folks, I’m trying to understand why the import script from mybb is failing to do its job on many posts. I’m trying to import from a forum that has around 190,000 posts. The import scripts imports some of them but many seem to be missing. The output of the script RAILS_ENV=production ruby mybb.rb contains lines like the ones below:

166417 / 170395 ( 97.7%) [74673 items/min] Parent post 1044397448 doesn't exist. Skipping 186530: PHP: HTTP_HOST vs. SERVER_NAME

or

Parent post 1251298548 doesn't exist. Skipping 188213: $_POST empty

and when I try to count how many of these posts are skipped, I get a considerable number, around 120,000. Quite a lot of them.

$ grep Skipping import.log | wc -l
124115

I can’t figure out why these posts are skipped. What does it mean that a parent post doesn’t exist? Any suggestion on where to look next?

Try uncommenting this in the import script…

https://github.com/discourse/discourse/blob/master/script/import_scripts/mybb.rb#L111-L123

2 Likes

ooops, I noticed that comment as soon as I hit Post on my message here. I guess I skimmed through it because I didn’t think the old mybb forums were imported from phpbb but maybe they were (it’s a 8 years old site). I’m running the import with that query now, looks promising so far. I’ll report once it’s done.

BTW, I believe there is a typo in the query. Line 117 should not end with a ,

The script finished running… now I have a different set of skipped articles, much less but some articles are still missing nonetheless.

$ grep Skipping import-no-comment.log | wc -l
78422

Any suggestion on what to do next?

Find a few of those skipped posts and look at them in the MyBB database. What do they have in common? Why do the importer’s query not find the first post in the topic? That’s how I would try to debug and fix this problem.

2 Likes

The importer seems to throw an exception and skips some of the original articles. This is what I see in the importer’s log:

   119718 / 170395 ( 70.3%)  [1507 items/min]  Exception while creating post 123257. Skipping.
   119719 / 170395 ( 70.3%)  [1507 items/min]  Parent post 123257 doesn't exist. Skipping 123258: My website just went completely non-respo
   119737 / 170395 ( 70.3%)  [1507 items/min]  Parent post 123257 doesn't exist. Skipping 123276: My website just went completely non-respo

Looks like the importer barfs for some reason at the beginning of importing a thread.

Searching for common threads, I noticed that at least one of these failing to import shares the tid (thread ID) … not sure why (an excerpt of the query below).

+--------+--------+-----+--------------------------------------------+
| pid    | tid    | fid | subject                                    |
+--------+--------+-----+--------------------------------------------+
| 154935 | 126704 |  15 | How to configure CNAMe
| 129604 | 126704 |  15 | PM Spam                                    |
| 126711 | 126704 |  15 | MyBB!                                      |

Also may of the threads that failed to import cleanly have zero replies. Could that have anything to do with the exceptions?

I may have spotted a pattern now: no post older than Oct 31 2016 gets imported. I can’t see what the newer posts have different than those before Oct 31 :frowning:

Once I spotted this issue, I have run the importer again but this time reducing the batch size and limiting the query to only the posts with datetime after Oct 31 2016. This completed the import.

2 Likes