Porting from Askbot to Discourse: my experience and code


(Michael Grant) #1

I recently moved CVX Forum, a support forum for users of my convex optimization software, from Askbot to Discourse. You can read about my motivations for making the change here. We can have some of that discussion here if you want, but in this thread I wanted to offer up my experience and code.

The code I used can be found in this GitHub repo. There are only two files there: base.rb and askbot.rb. base.rb is actually a very slightly modified version of script/import_scripts/base.rb in the Discourse tree. Indeed there is exactly one change: in line 302, I changed

if bio_raw.present? || website.present?


if bio_raw.present? || website.present? || location.present?

so that I could port user-supplied location information over from Askbot profiles.

Let’s talk about the meat of the code in askbot.rb. Some general notes:

  • Askbot uses Markdown, making porting of the raw text easy.
  • Askbot uses PostgreSQL for its database. Alas, it seems none of the other import scripts rely on PostgreSQL, so I had to figure out how to use the PG Ruby interface all by my lonesome. (Poor poor pitiful me.) The toughest part was getting the timestamps right; I might have still gotten the time zone issues wrong, but I am fine with the result now.
  • I decided to thread comments and answers together into a single linear time stream. In my forum, the distinction between the two types of responses was unnecessary and forced, so I’m frankly happy not to have that distinction now.
  • I did not bother to link comments to their matching questions or answers; our threads were not that long. In theory, you could do so.
  • I chose to select a subset of Askbot tags to convert to categories. Any post that had one of those selected tags was moved into the corresponding category; all others were left uncategorized. I considered creating a default category to catch all of those other posts, but ultimately it seems like there is no real disadvantage to having uncategorized posts in my case.

The execute procedure proceeds as follows:

  1. create_cats: create Discourse categories for the list given in the CATEGORIES global. These category names are assumed to match the Askbot tag name in a case-insensitive comparison.
  2. import_users: suck in the Askbot users. I mapped the username, email, is_staff, date_joined, last_seen, real_name, website, and location fields quite directly to corresponding Discourse fields. I have no idea if password hashes can map over; I didn’t bother to try.
  3. read_tags: for simplicity, I read in the entire tag database from Askbot. I decided to attach the full tag list for each post as a custom_field entry, even as I sifted through them to make category determinations.
  4. import_posts: read in the questions, determine their categories, and store their thread_id values for later matching with comments and answers.
  5. import_replies: read in comments and answers. Questions, answers, and comments are actually stored in the same Askbot table, but I decided to make two passes since comments and answers were to be processed differently.
  6. post_process_posts: try to convert Askbot internal links to Discourse internal links. Thankfully, I didn’t have too many of these, but I did need to distinguish between A HREF-style HTML links, Markdown-style []() links, and bare text links.

Needless to say, this is a hack. In theory, I could wipe my database clean and run this code, and be done. In practice, I ran it several times, commenting out some of the steps so I could verify the intermediate results before proceeding. This is not even close to being ready to dump into import_scripts, and I don’t intend to make it so :slight_smile: Besides, this is the longest piece of Ruby code I’ve written (the record before that was a Homebrew formula).

I’m quite pleased with how things turned out, and of course that’s no small part due to the quality of the Discourse code, in particular the script/import_scripts directory. The base.rb code is essential, of course, but the other templates there were extremely helpful. Thank you to all contributors.

If you have questions or comments, by all means, let me know and I will do my best to respond.

(Michael Grant) #2

Ha ha! I just noticed that this forum has optional tags in addition to categories. In theory I can add a postprocessing step that looks at those custom fields and restores the additional tags. Oh well, we honestly didn’t make good use of them on the old forum.

(Régis Hanol) #3

This is awesome. :+1:

Would you consider making a pull request so that your importer lives in the official repository?

(Michael Grant) #4

Thank you! I’m certainly more than happy to offer it to anyone who wants it. But a PR? It seems a bit untested :slight_smile: I mean, obviously I used it successfully, but as I said I had the luxury of iterating. Maybe the other templates in the import_scripts directory are that rough.

(Kane York) #5

Yeah, they pretty much all require at least some manual editing to get going. It’s just better to have it all in one place.

(Jens Maier) #6

That’s fine, when I submitted the SMF importer, it had exactly one successful test run - my own old SMF forum -, but despite several bugs it’s apparently been useful to at least three other people (who, incidently, found bugs, reported them, and got them fixed…) :smile:

(Michael Grant) #7

Okay then! I will do it. Stay tuned.

(Michael Grant) #8