Migrating from Jive Clearspace to Discourse

import

(Mikhail Vink) #1

Recently we’ve successfully migrated from Jive Clearspace platform to the Discourse for Kotlin community forums by JetBrains, so I’d like to share experience on such migration which some of you might find useful in the future.

First of all, I’d like to give a round of applause to @techAPJ, an author of the original migration script which helped us a lot as we’ve modified it for our needs (and now glad to contribute back some of the changes related to images/attachments). Also this thread will be a good start if you’re just starting to learn about the migrations to Discourse. And surely we are very thankful to other friends at Discourse helping us with OAuth2 (which now works with JetBrains Hub as an SSO provider (thanks for fixes @eviltrout)) and other issues and setup.

We’ve started the migration process trying to utilise the Discourse API, but switched to working with the migration script shortly (mostly due to limitation which didn’t allow us to set creation date/time for the posts/comments).

Import scripts in Discourse require some time to understand the format and resolve all the issues (for those which doesn’t have documentation), so let me shed some light on what’s going on in the Jive migration script, so that at least you know how to approach it.

The first thing to understand is the format of the input data to give to the script. It’s simple CSV, but surely you need to provide all the data script requires. We’re not going to talk in detail about the data export from Jive itself as it has quite comprehensive REST API (we’ve used 2.5 and here is the doc on the REST API) - mostly Discourse import script uses the same naming as Jive API provides.

The migration script contains following parts:

Users: (input file: user.csv, function in the script: import_users)

  • userid
  • email
  • firstname
  • lastname
  • username
  • creationdate
  • lastloggedin
  • userenabled

Groups: (input file: group.csv, function in the script: import_groups)

  • groupid
  • name

Group members: (input file: group_members.csv, function in the script: import_group_members)

  • userid
  • groupid

Categories: (input file: community.csv, function in the script: import_categories)

  • communityid
  • name

Posts/Messages: (input file: message.csv, function in the script: import_posts)

  • parentmessageid
  • messageid
  • containerid
  • userid
  • subject
  • body
  • creationdate
  • threadid
  • attachmentcount (added by us for attachments upload)
  • imagecount (added by us for images upload)

We needed to have both images and attachments migrated to the Discourse from Jive, so we had to add this functionality to both posts and messages import (see the pull request). Including images and attachments complicates a directory structure a bit, and we assume here that you use standard directories in your Discourse container.

The easiest way would be to add jive directory to /var/www/discourse/script/import_scripts/ including all the CSV input files there and adding img & attach folders which would include images and attachments in folders named by the ID of the message e.g. /var/www/discourse/script/import_scripts/attach/556123212/attachment_to_upload.zip.

Some of the other issues and modifications:

  • The initial script defines categories that should be skipped in the beginning of the script (CATEGORY_IDS), we’ve decided to do that on the export from Jive step, so removed this array and its use (commenting lines with # there doesn’t break anything as it seems).
  • The initial script does the normalisation of the message body with a nokogiri ruby gem, and it didn’t work well for us, so we’ve done more controlled normalisation on the export from Jive step and removed the normalisation from the Discourse import script.
  • We’ve encountered some problems with code chunks formatting which we had to resolve manually (about 0.2% of all messages due to large amount of code inside).
  • We’ve encountered some problems when reply in Jive Clearspace was created by reply to the news list from the email which is not registered on the forum (then it returns the post under ANONYMOUS user in the API, we had to make this user to be Discourse system user and be careful handling it in the user.csv file so that it doesn’t trigger errors).
  • URLs with class _jive_internal have to be changed manually (that’s non-full internal links to other posts).
  • Original script appends -x at the end of each email during the import (for testing purposes most probably, you might consider removing it for the real migration): email = “#{row.email}-x” (discourse/jive.rb at master · discourse/discourse · GitHub).
  • The script can be launched via command bundle exec ruby script/import_scripts/jive.rb DIRNAME where DIRNAME is the directory where your CSVs/etc are located (jive in our examples).

That’s it! I hope this information will help those who migrate from Jive or other forum platforms to the Discourse.


Data Migration from old Jive Forum (Version 5.0.4)
Data Migration from old Jive Forum (Version 5.0.4)