[Complete] [Paid] Migration script from NodeBB (redis) to Discourse


(Geran Smith) #1

What would you like done?
I’m looking to migrate from my NodeBB instance that uses redis as the database, over to Discourse. My goal is to have categories, topics, posts, and users migrated over. MVP for the users would be username and email to allow people to reset their passwords when they try to login to the new site.

I’d also like the migration script to be open sourced and potentially checked into the Discourse repo: discourse/script/import_scripts at master · discourse/discourse · GitHub

When do you need it done?
No particular hurry, NodeBB is working well so far. I just like Discourse more :slight_smile:

What is your budget, in $ USD that you can offer for this task?
Since all I need is the script, and not someone to run the script for me, I was hoping for $200-$300 at the most. I’d obviously love it to be less if possible. I run the forum as a hobby and my wife will be made if I spend a bunch on something.

Additional Info
NodeBB data structure: Database Structure · NodeBB/NodeBB Wiki · GitHub
Discourse to NodeBB import (in js): GitHub - BenLubar/nodebb-plugin-import-discourse


(Jeff Atwood) #2

We at Discourse will be willing to split this with you, that is, we can pay half the cost.


(Geran Smith) #3

That is awesome! Hopefully that sweetens the pot for someone :smiley:


I poked around a bit at this to see what all I can figure out. I definitely don’t know enough about Ruby or the Discourse database structure to intelligently put this together.

I entertained the idea of doing a json export and was able to find a handy way to export posts, topics, and categories from some NodeBB functions.

var fs = require('fs');
var nconf = require('nconf');


nconf.argv().env().file({
  file: './config.json'
});

var db = require('./src/database');

function do_export(set, prefix) {
    db.getSortedSetRange(set, 0, -1, function(err, ids) {
          var keys = ids.map(function(id){ return prefix + id; });
          db.getObjects(keys, function(err, data) {
                fs.writeFile('db.export.' + set + '.json', JSON.stringify(data), function() {
                        console.log('Wrote '+set);
                });
          });
   });
}

db.init(function() {
  do_export('posts:pid', 'post:');  
  do_export('topics:tid', 'topic:');
  do_export('categories:cid', 'category:');
});

I noticed there was a json importer in discourse/script/import_scripts at master · discourse/discourse · GitHub, but I wasn’t sure how to properly name the json fields from NodeBB into the format for that import tool. The other missing piece was properly exporting and creating users in Discourse.

Here is some of the information I’ve been able to gather.

Example JSON output for categories

	{
		"cid": "1",
		"name": "Announcements",
		"description": "Announcements regarding our community",
		"icon": "fa-bullhorn",
		"bgColor": "#fda34b",
		"color": "#fff",
		"slug": "1/announcements",
		"parentCid": "0",
		"topic_count": "37",
		"post_count": "479",
		"disabled": "0",
		"order": "1",
		"link": "",
		"numRecentReplies": "1",
		"class": "col-md-3 col-xs-6",
		"imageClass": "cover",
		"descriptionParsed": "Announcements regarding our community",
		"undefined": "on"
	}

Example JSON output for topics

	{
		"tid": "94",
		"uid": "1",
		"cid": "1",
		"mainPid": "1741",
		"title": "DNS Issues",
		"slug": "94/dns-issues",
		"timestamp": "1466967610616",
		"lastposttime": "1466967681716",
		"postcount": "2",
		"viewcount": "363",
		"locked": "0",
		"deleted": "0",
		"pinned": "0",
		"teaserPid": "1742",
		"upvotes": "0",
		"downvotes": "0"
	}

Example JSON output for posts

	{
		"timestamp": "1466967610620",
		"tid": "94",
		"deleted": "0",
		"edited": "0",
		"content": "In my attempts to resolve some issues with email activations, I made some changes to DNS. Unfortunately, the switch did cause a refresh of DNS and is making the site unresolvable by some. Please keep trying. A reboot of your router, modem and PC may fix it as well to help out! Sorry again for breaking things. Good old growing pains :D",
		"replies": "1",
		"editor": "",
		"uid": "1",
		"pid": "1741"
	}

Example of the user data in redis (NodeBB)
NOTE: I can’t find an easy way to export data out from redis into JSON (I’m ignorant of redis as well), so I manually formatted this in JSON

	{
		"password": "salted_hash",
		"birthday": "mm/dd/yyyy",
		"reputation": "2623",
		"joindate": "1466804324458",
		"fullname": "Geran Smith",
		"signature": "(Shh, I am Geran)\nSteam Group -- http://steamcommunity.com/groups/gaming_exodus\nMy gaming profiles -- https://gamingexodus.com/post/17901",
		"banned": "0",
		"picture": "https://www.gravatar.com/avatar/9e2032064fd490a386fffd19b98feace?size=192",
		"uid": "1",
		"lastposttime": "1524180627000",
		"cover:position": "50.0307% 54.4151%",
		"followingCount": "3",
		"website": "https://gamingexodus.com",
		"passwordExpiry": "0",
		"postcount": "1966",
		"uploadedpicture": "",
		"userslug": "teh-g",
		"email:confirmed": "1",
		"lastonline": "1524190227557",
		"email": "user@domain.xyz",
		"username": "teh g",
		"flags": "0",
		"aboutme": "",
		"cover:url": "https://i.imgur.com/XTMkINp.jpg",
		"profileviews": "1712",
		"rss_token": "token_string",
		"groupTitle": "administrators",
		"followerCount": "25",
		"topiccount": "166",
		"status": "online",
		"location": "USA"
	}

Hopefully my weirdly obsessive research helps someone to take the bounty!


(Jay Pfaffman) #4

Like Discourse Migration – Literate Computing says, “I typically charge $1000-$2500 to write a new importer.” You’re still pretty far away from that, even doubling your budget.

That json_generic.rb is missing a lot. It doesn’t import categories, for example; users are inferred from posts.

I won’t promise that I’ll do the job for $600, but if you’ll send me the backup files, I’ll have a look and put together an estimate that comes close to that (e.g., not doing 301s or avatars could save some time).


(Geran Smith) #5

I’d definitely love to work with you. You’ve clearly done some great work on the project!

I personally don’t need categories. I’d come to terms with it being somewhat easier to manually make categories and just replace the category ID in my data with the new categories. That would definitely break the spirit of the full import though.

I definitely don’t need anything for doing 301s or avatars. Many users on my board use Gravatar or can just reupload. I don’t think redirects will add enough value to be worth spending time on. I’d just let people know they can expect some broken links for self-referencing links.

Do you want the Redis rdb file or did I turn this into a refresh of the json importer? I think to keep with the Discource folks plans of covering half the cost, we’d have to do the rdb file.


(Jay Pfaffman) #6

Email me at support@literatecomputing.com.

I’m not sure. I think what I want is JSON files, since you started talking about JSON files, but we can work that out via email.


(Orlando Del Aguila) #7

I think a better approach will be connecting to the NodeBB database and get the data directly, as other importers do. I wrote an importer for a custom forum so I don’t think it will be that complicated.

What Data store are you using for the NodeBB forum? MongoDB or Redis?


(Jay Pfaffman) #8

I agree in principle, at least. It appears to be in mongodb. I’ve not used Mongodb, so it’ll take me a couple hours to figure out how to install it, get data into it, and so on before I pass Go.

The yahoogroup importer uses mongodb, so there’s a starting place.

If you’re interested, please contact @tehspaceg, and, especially if you can hit the $600 budget, please let me know so that I don’t spend the time coming up with an estimate.


(Geran Smith) #9

I definitely agree that the better route would be straight from the source DB into Discourse. It isn’t too complicated to extract the data via scripts into json format. The most painful one was the users, but that was because I didn’t want to write a Python script to iterate through the API, so I just did it manually.

Unfortunately I’m using Redis. Before about a year ago, the recommendation was to use Redis as the database. It looks like NodeBB is transitioning to MongoDB as the recommended database. They still officially support both, and they have not published their Redis -> MongoDB converter yet (they hold it behind a paywall).

My data is redis. I just exported it out into json to make life “easier”.


(Orlando Del Aguila) #10

Cool, way off my normal rate, but since it’s going to be open sourced, I could probably do it.

@tehspaceg send me a PM or an email at orlando@hashlabs.com

@codinghorror do you guys want to chime in for a NodeBB Redis only importer implementation?


(Jeff Atwood) #11

I think we can contribute ~$1000 to this effort if @erlend_sh is OK with it?


(Orlando Del Aguila) #12

Cool, if that’s the case I’ll do it. Payment after delivery since it’s my first gig on the #marketplace


(Geran Smith) #13

Email sent.

Thanks for pitching in @codinghorror


(Geran Smith) #14

Quick update; this is being worked by @eatcodetravel, pending some confirmation of @codinghorror/Discourse’s generous offer.


(Erlend Sogge Heggen) #15

Consider it confirmed :+1:


(Orlando Del Aguila) #16

Update

Working on this on the weekends at the moment, but I’ll put some time also early next week. I’m expecting on having it ready by middle next week.

As for now, we have migrated

  • Groups
  • Categories
    • For categories, since Discourse doesn’t support more than 1 level of nesting in children categories, I’m putting all the categories under the first parent. So it will look like this.
      Category -> Sub Category -> Sub Sub Category
      Category -> Sub Category, Sub Sub Category

Cheers


(Geran Smith) #17

Thanks for the update. I think that category mapping makes sense. Discourse handles categories better than NodeBB (IMO), so I was planning on shifting things around anyway.


(Jay Pfaffman) #18

On thing that I’ve done for some imports is map sub-sub categories to Discourse tags. It’s always on a per-job basis, though, as there’s no automated way to guess what the right thing is. The map sub-subs up a level is a good solution. I think many importers just bump them to the top level.


(Orlando Del Aguila) #19

Quick update, this is how it looks now

We have migrated

  • Categories
  • Groups
  • Group membership
  • Users
  • Topics and Posts

There still a few things I need to resolve, like supporting local attachments, this nodebb dataset uses a plugin to upload all attachments to imgur, but I can test that with my own installation.

I won’t be able to work on this until the weekend, but hopefully I should be able to wrap up this by next Monday.

Thanks!


(Jay Pfaffman) #20

I think that there are a couple scripts that download images or you could just leave the links to imgur and optionally let Discourse pull them to local.


Nodebb to Discourse