Migrating a 200,000 member forum (LDU to Discourse)


(Thomas Wilson) #1

Hello folks,

This is my first topic - very excited about what Discourse has to offer, and just had a few questions I was hoping you could shed some light on. I’ve looked around the support forums but couldn’t find everything I was looking for.

I run a forum for the Muse fan community which is built upon a legacy LAMP setup (using LDU as its CMS). Discourse looks great - I have a test site up and running and it’s working well (digitalocean // mandrill // S3 // ghost + discourse setup).

We’d like to migrate our news over to ghost and our forums over to Discourse. Our current news commenting system hooks into our forum, so I first of all had a quick question regarding that.

  • If I migrate our existing forums over to Discourse (including the ‘News’ forum) and set up an RSS feed of new blog entries from Ghost (which will contain our migrated news), how can I ensure we don’t get any duplicates? Does discourse have any clever algorithm to detect/prevent duplicated entries being created?

My second question revolves around how users are handled. Our existing site has close to 200,000 of them – not all of whom are active (in fact, only a small minority are). Not all of them have posted in our forums and some of them are banned.

  • What actually happens when I import users (e.g. using a modified version of the phpBB3 script)? How are they notified that they have new accounts – and can these notifications be delayed/disabled whilst I’m still testing (I don’t want to spam them all!)? Could I maybe invite them in batches?
  • Does Discourse support/handle (or have plans to) bounce notifications from 3rd party e-mail solutions (e.g. Mandirll has a bounce hook)?
  • Levels/badges etc - If I import a user who has, say, 500 posts to their name, are badges/trust levels automagically applied, or are these only calculated after everything has been imported?

And finally, performance. The current forum has very low activity at the moment, for a variety of reasons (next to no notifications, twitter/facebook emigration etc.). We currently get around 50 new posts a day (it used to be +1000). There are ~2.4m posts and ~50k topics.

  • Will the shear size of the database, regardless of # queries, cause performance issues?
  • I’m expecting activity to increase once we go live (because Discourse is freakin’ awesome). I’m testing using a 1GB digitalocean droplet which I can scale up/down accordingly, but is there anything else I should be aware of before opening the floodgates?

Thank you all in advance for any tips/suggestions you might have.


(Jens Maier) #2

The import script will typically create a user account in Discourse that as closely as possible resembles the account on your old forum. How this works depends on the script. Because passwords are almost always stored as hashes, these can not be converted. The converted accounts will have an undefined password and your users will not be able to login until they reset their passwords via Discourse’s password recovery function. This means that all users need to have a valid email address set before the conversion, or they will lose their accounts.

Trust levels are updated by a background job once you start Discourse after importing the old forum. It may take up to a day until the job is started, but you can manually trigger these jobs via the /sidekiq dashboard.

PostgreSQL can handle a lot of abuse if you give it enough hardware resources. However, since a huge database means huge indices (which usually means that the database consumes more RAM while idle), I would start out with at least the 2GB DO droplet.


(Thomas Wilson) #3

This is great - thank you.

Just to clarify, the simple act of importing users won’t schedule emails to be sent to them then (e.g. “Welcome to…”)? I’d like to handle Welcome e-mails manually if possible. If someone were to reply to a thread, would (unverified / unconfirmed) users still receive notifications? A user-level ‘disable notifications’ flag, or similar, would be great.

I’ll look at upgrading the droplet to 2GB - for the timebeing though, 1GB is fine for my testing.

Didn’t know about the sidekiq panel – something else to play with!

Thanks again


(Jens Maier) #4

I’m not certain. I’ve done a lot of testing with import scripts, but haven’t gotten around yet to do it on a production environment. The mailcatcher for development mode remained empty, however.


(Kane York) #5

Note that disk space may also be a concern - so you may as well just create a new 2GB for the extra disk space (resizing does not upgrade the disk).