Discourse vs. mailing lists (Majordomo or otherwise)


(Gunnar Helliesen) #1

The Jag-lovers Forums can be found here:

Background

In effect, we migrated to Discourse from a Majordomo mailing list archive with around 20 mailing lists and 1.8 million messages, going all the way back to 1993.

As the users saw it, we did have a “web forum". It was a home-brew solution, written for the LAMP stack by one of our admins, Tony Bailey, back in the late 90s. People were asking for ways to join the discussions via the web, so Tony wrote what was in effect a web frontend to our mailing list messages. Users could browse, search, and participate via the web.

The code for doing this was complex and contained many moving parts. There were indices in MySQL for searching, HTML copies of the individual messages for browsing, and scripts for interfacing with Majordomo for list management.

Some users, who had been with us for decades, loved the old interface and were very resistant to change. Others came to us with fresh eyes, used to modern UX design, and couldn’t believe how backwards our site was. They soon got frustrated with our unique interface, and the delays from posting until their messages showed up on the forums (they had to pass through a maze of code, and be delivered via SMTP), and left us for other fora.

The pros and cons of Discourse vs mailing lists

Mailing lists
Pros: Very light resource use if you don’t start creating a web frontend for them. Anyone can participate from almost any platform, over any connection, even RFC 1149. Email (or more precisely, SMTP) has existed since 1982 and won’t go away anytime soon.
Cons: Searching is a manual process of downloading the archives and searching through them locally. No web interface, unless you create one. Most users expect a web frontend these days, and instant gratification. Their attitude seems to be that email lists are for spammers. You’ll lose many potential users if all you have is a mailing list.

Discourse
Pros: Very modern, yet lightweight design. Based on modern technology. Easy to use, easy to search, fast. Fits in nicely with Wordpress, Slack, and other modern tools. New users will take to it like fish to water. Distribution is slick, via Docker, making for an easy install.
Cons: Hard to hack, unless you’re intimately familiar with RoR, json, CSS, Postgres, nginx, and Docker. Besides, the tools that are all the rage today could go out of fashion tomorrow. If your list/forum is very long lived, you may want to start planning your next iteration now, as you’re probably going to need it some day. If you have a bunch of longtime users, they may grumble at all the modern whizz-bangs. Distribution and installation is almost too slick, via Docker, creating a bit of a “black box” feeling.

Importing our lists

We faced a daunting task in moving all of our home-brew code to a different platform, and were very happy when we found Discourse. Not only does Discourse have an MBOX import tool, but it also allows users to subscribe to topics via email, and even participate in discussions via email. Not many of our users still do, but those who do are adamant that they want to continue doing so.

What made life much easier for us was that we had saved every single mailing list message since the beginning in MBOX format. We used the Majordomo archive2.pl tool for creating monthly MBOX archives for each list. We started doing this a long time ago, “just in case”, even though it wasted a lot of space since the messages were also stored individually in HTML format for browsing from the web.
We decided to skip trying to import our user database, since we were going to force all users to change their passwords on the new platform anyway. Forcing a password change not only increased security, since Discourse can enforce password complexity and our old web frontend was fairly rudimentary in comparison, it also gave us a metric of how many users would follow us over to Discourse. Finally, it made the move a lot easier and a whole lot faster.

The Discourse MBOX import tool creates new users automatically, based on the email addresses it finds as senders of the messages. This took care of the vast majority of our users. Some had of course used several different email addresses over the years, which meant that they’d have several accounts created on Discourse, with messages attributed to each account. The only way to fix this is to manually merge the accounts after the import.

We followed the instructions on “HOWTO: Import MBOX (mailing list) files” found here:

If you scroll down and read the discussion, you’ll find some details of the various problems we encountered.

We fired up a massive EC2 instance on AWS, installed the Discourse dev environment, and set to work running the MBOX import tool. There were some gotchas, such as older messages having 2-digit year dates, or otherwise non-standard date headers. Since our messages go all the way back to 1993, this was to be expected. Some messages ended up with dates in the future (“93” becoming 2093), and some had headers so badly non-standard (“bang” addresses and the like) that they errored out and were simply dropped.

The MBOX import on the dev node was slow and took the better part of three weeks, even with massive amounts of CPU, RAM, and the database and work file systems mounted on memory file systems. We tried speeding things up by running multiple imports in parallel, but this created other problems since the processes were all trying to talk to the same database instance. There were a lot of crashes and a lot of hand-holding of the whole import process. This may not be the case for you, if your import is smaller.

We started by importing everything from the beginning of time until the last day of the previous month. In our case that meant everything from March of 1993 until the end of October, 2016. This is the part that took three weeks.

The whole time we had a page up explaining what was happening, and what was about to happen in the coming days and weeks.

Once done with the initial import, we notified our users, set our old forums to read-only and added an auto-bouncer to the mailing lists with a text explaining about the move. We then started importing all new messages posted since November 1st, 2016 until the present day, which was November 22, 2016. This import took about 20 hours. Once that was done, we’d successfully imported the vast majority of our 1.8 million messages, and had around 38,000 users created for us in the Discourse user database. Our old site had well over 100,000 users defined, but over 60,000 of them had been lurking and had never posted anything. Because of that, they never got created on the Discourse side by the MBOX import tool. We still have the old user database, so if we want to, we could one day do something about this.

Finally, we copied the Discourse instance over to the production site per the “Move your Discourse Instance to a Different Server” HOWTO:

We made sure the site looked OK, and that we had links up with instructions on how people could reset their passwords so they could log in. Once we were happy that everything was ready, we opened for business on our shiny new Discourse site.

That’s when we encountered the biggest gotcha of the entire process. Discourse decided that all 1.8 million messages were new to all 38,000 users, so it started sending out massive amounts of digests to everyone. This very quickly got us flagged as spammers. We ended up disabling digest emails for all users as a result of this.

Final notes

Other than that, the move went surprisingly well. Our site is still quite busy, although some people are still grumbling about the change, 8 months later. The die-hard email users are still able to participate, which came as a bit of a pleasant shock to them. I think they were fully expecting that once we moved to a modern platform, they’d be forced to use a web interface.

Best of all, we got a lot of help from the Discourse community and the Discourse devs. They’re still an invaluable resource when we experience problems or have feature requests.


(Andrew Waugh) #2

Just to add to @Gunnar’s review: I had been a member of Jag-lovers for a long time, but not a moderator. When we went live with discourse I asked if I could help somehow and ended up being made a moderator about 15 minutes later… Discourse seems pretty intuitive to me, so much so that within about 30 minutes I was actively helping users get to grips with the new platform. The speed at which my noob questions were answered on meta made things quite easy to understand.