Merging two Discourse forums

I’m trying to use this and I am not having much success.

Discourse A has ~100k posts and Discourse B ~30k posts. I’ve moved all categories under one main category and then exported it from Discourse B. When I import into Discourse A only 5 posts are added and two are attributed to an incorrect user.

For the incorrectly attributed example post I’ve noticed that in Discourse B it was made by a user with a user id 176, and in Discourse A it is attributed to a user with the same user id. But those are two distinct users.

No users in A and B have the same email address, but ~100 have the same user name.

Oddly, when I first tried a test to see if this could work, I ran the export-import and it seemed that most of the 30k posts were transferred, but post ownership was completely jumbled.

Perhaps a fix to the post ownership issue could be to change user ids in Discourse B before export so that they don’t have a duplicate in Discourse A. Can this be done in bulk? Do i need to change usernames in B, and can it be done in bulk by adding “_2” to each username?

As for the 29,995 posts that didn’t make it over - I’m stumped. I see those topics scroll by during import, but not in Discourse A after import.

7 Likes

Hmm. I think the solution might be to write an importer for Discourse.

Could be an interesting project.

1 Like

If you have a budget and would like me to write an importer, let me know or post in marketplace.

Users, posts, and categories should be easy enough, but you’d also want to replace @mentions and post/topic links within discourse, which would add a bit of complexity. Maintaining user and post stats could also be done with more time or money.

3 Likes

I think I see why that’s happening. The topic importer only looks at email address, but it uses a common importer method to create a new user when it doesn’t find an existing user with the email address. That method will look for a matching username, which is not what I want… This is a bug so I’ll have a look.

Scratch that. It does work by looking only at email addresses. @omarfilip Was your Discourse A site a migration from other forum software?

4 Likes

Yes - “A” is from phpBB using @gerhard’s awesome importer, and “B” came from an ancient forum that @pfaffman successfully converted from a sql dump.

1 Like

I see. So user id 123 from phpBB “A” is not the same user as user id 123 from ancient forum “B”, but the importers are going to see them as the same and not create possible duplicate users. That’s normally the behaviour we want, but not in this case. We need a way to force treat all incoming users as new.

1 Like

OH!

In that case you (or I) should run that script on top of your new data rather than try to merge the two Discourses. I didn’t know (or didn’t fully appreciate) that was what you were trying to do.

I think it might “just work” to run it on top of your forum, but it might need some tweaks.

Send me an email with the details and we’ll figure out how to proceed.

5 Likes

Correct.

However, username johndoe from “A” is the same person with username johndoe in “B.” In my case it would not matter much if during the conversion (or in a prep step before) johndoe from B became johndoe_2.

That should happen automatically if the import script runs on top of your existing Discourse database.

Would an incremental import from phpBB work after that script was run? (To catch up with posts that were made after I made the initial import from phpBB.)

I’m pretty sure that this will work:

  • freeze the live formerly phpbb Discourse,
  • backup
  • restore that backup to a development machine
  • import the ancient data on dev machine
  • backup
  • restore to prodcution
  • profit

The only question is whether there are any clashes between import ID on the phpBB data and the ancient data. This should be pretty easy to fix in the importer (e.g., add “X” before all of the import IDs).

4 Likes

I fixed a bug where this wouldn’t work because it was being converted to an integer. "x123".to_i == 0. I also added a way to treat the output files from the topic exporter as coming from different places, which essentially does the same trick of appending an arbitrary string to the id.

IMPORT_SOURCE=A bundle exec ruby script/discourse import_topics filename-1.json
IMPORT_SOURCE=B bundle exec ruby script/discourse import_topics filename-2.json
4 Likes

OH! You’re back on the import_topics solution. I was talking about re-running the importer second importer. Several importers count on import_id being a string.

It’s not immediately clear which solution is easier/preferable. I suppose if the import_topics solution works, that would be great!

Does this mean if I re-run Topic and Category Export/Import it should now work?

Every case is a special unique snowflake. :snowflake: You worked on the migration for @omarfilip , so you would know better than me.

The group and user id’s were being converted to integers, so that could have been causing problems. Everything in the LookupContainer uses strings as it should though.

Only if you can get today’s code to use, but wait until the tests-passed branch has it.

5 Likes

I re-ran the export/import with Discourse (1612818).

The only apparent difference was that the importer reported which emails it was invalidating during the import. The result of the process was still the same - only 5 posts got imported, and post ownership is still mismatched.

Checking the status of the code - does this mean not everything made it into tests-passed?

It’s in test-passed. The Travis build was cancelled for some reason.

If IMPORT_SOURCE still only created 5 posts, then I don’t know what’s happening. At this point someone needs to compare the data (post id’s, user emails, etc.) in the exported data with what’s in the target db and figure out why there’s a collision (user_custom_fields, post_custom_fields).

1 Like

Does the category importer care if a name has a duplicate, but the username and email are unique?

In the forum that’s being exported/imported there are several that are like that.

No, it never looks at name. If you’re using the new IMPORT_SOURCE support, then email address is the only thing that it looks at. If there’s a duplicate username but different email address, then a number will be put on the end, like “omar1”.

5 Likes

Success! :tada:

With @pfaffman’s guidance, deleting import_id from both Discourses before export/import worked:

rails c
UserCustomField.where(name: 'import_id').delete_all
GroupCustomField.where(name: 'import_id').delete_all
CategoryCustomField.where(name: 'import_id').delete_all
PostCustomField.where(name: 'import_id').delete_all

This did not work:
discourse import_category /category-export-2017-10-25-225306.json

but this did:
bundle exec discourse import_category /category-export-2017-10-25-225306.json

Having to delete import_id from the Discourse that was converted with the phpBB importer prevents me from doing a leisurely initial import first and an incremental import later, but I can live with that.

Many thanks to @neil and @pfaffman!

5 Likes