This Howto is out-of-date.
Please check out Migrate a mailing list to Discourse (mbox, Listserv, Google Groups, etc) for updated instructions.
Want to move your mailing list archive to Discourse?
Let’s do it!
Prerequisites
-
Not everything that writes MBOX files plays by the same rules. You will likely need to do some programming to get your import to work just right. This script should import most of your data, but unless you are very lucky, getting every bit of it will take a bit of code tweaks.
-
If you are using a hosting service, make sure you know what version of Discourse they are using. If the Discourse version you use for the import is newer than your target platform, you will not be able to restore the database you create in this guide.
-
Set up Discourse development environment on OS X, Ubuntu or Windows.
-
Clear existing data from your local Discourse instance:
cd ~/discourse
rake db:drop db:create db:migrate
- Optional: add an admin account:
RAILS_ENV=development bundle exec rake admin:create
Preparation of your files
Are your mailbox files perfect, with no broken email addresses, no text you want to remove, and you want all those messages in the same category? Skip to the next section.
Note: Files are imported in alpha order, so if you want your topic IDs to match the date order, name your files accordingly. If you don’t care about that, neither does Discourse.
It is easiest to leave lots of messages in each file (typically one file per month), but you may choose to use formail
to split those files into one message per file so that you could do clever things with grep
and find
to move messages into different folders to categorize them.
You can split your mbox
file into one-message-per-file like this:
export FILENO=0000
mkdir split
formail -ds sh -c 'cat > split/msg.$FILENO' < mbox
If you wanted to move all of the messages with “job” in the subject into a separate directory, you might do this:
mkdir jobs
find . -name "msg.*" -exec egrep "^Subject:.*job" \{\} \; -exec mv \{\} jobs \;
Are there annoying footers or advertisements in your email messages? If you want to remove them /en masse/, you should do so now with perl
, ruby
, ex
, awk
, or sed
. If you know Ruby, you might also look at clean_raw
, which you can also tweak to do some replacements. Text can also be removed (or modified) by the script itself. Look at the gsub
s in def clean_raw
in the script for an example.
Discourse does not try to remove email signatures. If you’re foolhardy, determined, or a perfectionist, you might split them all into single files with formail
and see how things go if you remove everything after the first /^-- $/
in a message. This is left as an exercise to the reader.
Configuring the import script
- Paste the following into your shell/terminal:
export MBOX_SUBDIR="messages" # subdirectory with mbox files
export LIST_NAME=LIST_NAME
export DEFAULT_TRUST_LEVEL=1
export DATA_DIR=~/data/import
export SPLIT_AT="^From " # or "^From (.*)"
You can then use the up-arrow to return to those lines and use the arrow keys to edit them as described below.
- Replace
MBOX_DIR
value with the location of your mbox directories. An SQLite database file will be created in this location.
-
Replace
MBOX_SUBDIR
if it is not called “messages”. Gzipped files are OK. If you create subdirectories belowmessages
, they can be imported into separate categories (see below). -
If you have Subjects like
Subject: [List Name] blah
, and you would like[List name]
removed, setLIST_NAME
accordingly. -
Users created by the script will be created with
DEFAULT_TRUST_LEVEL
. Set this value to whatever level you deem appropriate. -
Check the format of your MBOX files to see what they look like. The MBOX “standard” (in as much as there is one) is that all lines that start with "From " (that’s “From” followed by a space); lines in messages (rather than the header) that begin with “From” are supposed to have a “>” inserted before them. Your mileage may vary. Set
SPLIT_AT
accordingly. -
If you have organized your messages into folders/directories by categories, you will need to edit
mbox.rb
to map those directory names to category names in section like this:
CATEGORY_MAPPINGS = {
"default" => "uncategorized",
# ex: "jobs-folder" => "jobs"
}
- If the email address in the
From:
line is in the body of a message, it will be replaced with the @username. If you would like to replace every user’s email address in every message, you can uncomment# replace_email_addresses
in theexecute
function. It can take a long time, though; for every user, it does a database query for their email address and then does a replace against all of those hits.
Perform Import
- Start import process:
cd ~/discourse
bundle exec ruby script/import_scripts/mbox.rb
Or, if you would like error and warning messages to be saved for future consumption:
cd ~/discourse
bundle exec ruby script/import_scripts/mbox.rb 2>> logfile
Note that if you redirect errors to the logfile, you will not see them in your terminal.
- Wait until the import is done. You can restart the process if it slows down to a crawl. Before adding data to Discourse it is first read from the MBOX files and stored in an SQLite database. When you have trouble with the import, you can look there for clues. If a message cannot be imported (e.g., email address invalid) you will get notification.
- You can speed up restarts of the script by deleting (or moving) files that have already been processed.
- Start your Discourse instance:
bundle exec rails server
- Start Sidekiq and let it do its work:
bundle exec sidekiq -q critical,4 -q default,2 -q low
Depending on your forum size this can take a long time. You can monitor the progress at http://localhost:3000/sidekiq
- Backup the data on your development machine and upload it to on your production site by following this howto.
Congratulations! You have successfully migrated your mailing list to Discourse!