This guide is for you if you want to migrate a mailing list to Discourse.
It also contains instructions for importing messages from Google Groups.
This is the recommended way for importing content from your mailing lists into Discourse.
The import script most likely won’t work on systems with less than 4GB of RAM. Recommended are 8GB of RAM or more. You can scale back the RAM usage after the import if you like.
Install Discourse by following the official installation guide. Afterwards it’s a good idea to go to the Admin section and configure a few settings:
login_requiredif imported topics shouldn’t be visible to the public
hide_user_profiles_from_publicif user profiles shouldn’t be visible to the public.
download_remote_images_to_localif you don’t want Discourse to download images embedded in posts.
disable_edit_notificationsif you enabled
download_remote_images_to_localand don’t want your users to get lots of notifications about posts edited by the system user.
Change the value of
slug_generation_methodif most of the topic titles use characters which shouldn’t be mapped to ASCII (e.g. Arabic). See this post for more information.
The following steps assume that you installed Discourse on Ubuntu and that you are connected to the machine via SSH or have direct access to the machine’s terminal.
Copy the container configuration file
import.yml and edit it with your favorite editor.
cd /var/discourse cp containers/app.yml containers/import.yml nano containers/import.yml
- "templates/import/mbox.template.yml" to the list of templates. Afterwards it should look something like this:
templates: - "templates/postgres.template.yml" - "templates/redis.template.yml" - "templates/web.template.yml" - "templates/web.ratelimited.template.yml" ## Uncomment these two lines if you wish to add Lets Encrypt (https) #- "templates/web.ssl.template.yml" #- "templates/web.letsencrypt.ssl.template.yml" - "templates/import/mbox.template.yml"
That’s it. You can save the file, close the editor and build the container.
Google Groups import
You need to add two entries to the list of templates:
- "templates/import/chrome-dep.template.yml" - "templates/import/mbox.template.yml"
Afterwards it should look something like this:
templates: - "templates/postgres.template.yml" - "templates/redis.template.yml" - "templates/web.template.yml" - "templates/web.ratelimited.template.yml" ## Uncomment these two lines if you wish to add Lets Encrypt (https) #- "templates/web.ssl.template.yml" #- "templates/web.letsencrypt.ssl.template.yml" - "templates/import/chrome-dep.template.yml" - "templates/import/mbox.template.yml"
That’s it. You can save the file, close the editor and build the container.
/var/discourse/launcher stop app /var/discourse/launcher rebuild import
Building the container creates an
import directory within the container’s shared directory. It looks like this:
/var/discourse/shared/standalone/import ├── data └── settings.yml
You can skip this step unless you want to migrate from Google Groups.
Instructions for Google Groups
Make sure you don’t have any pinned posts in your group, otherwise the crawler might fail to download some or all messages.
Make sure the group settings allow posting, otherwise you might see “Failed to scrape message” error messages. It might take a couple of minutes before the scraping works when you changed those settings recently.
Google account: You need a Google account that has the Manager or Owner role for your Google Group, otherwise the downloaded messages will contain censored email addresses.
Group name: You can find the group name by visiting your Google Group and looking at the browser’s address bar.
Domain name: The URL might look a little bit differently if you are a G Suite customer. You need to know the domain name if the URL contains something like
In order to download messages, the crawler needs to have access to a Google account that has the owner role for your group. Please visit https://myaccount.google.com/ in your browser and sign in if you aren’t already logged in. Then use a browser extension of your choice to export your cookies for
google.com in a file named
The recommended browser extensions is Export Cookies for Mozilla Firefox.
cookies.txt file to your server and save it within the
Let’s start by entering the Docker container.
/var/discourse/launcher enter import
<group_name> (and if applicable, the
<domain_name>) placeholders within the following command with the group name and domain name from step 1.3.1 and execute it inside the Docker container in order to start the download of messages.
If you didn’t find a domain name in step 1.3.1, this is the command for you:
script/import_scripts/google_groups.rb -g <group_name>
Or, if you found a domain name in step 1.3.1, use this command instead:
script/import_scripts/google_groups.rb -g <group_name> -d <domain_name>
Downloading all messages can take a long time. It mostly depends on the number of topics in your Google Group. The script will show you a message like this when it’s finished: Done (00h 26min 52sec)
Tip: You can abort the download anytime you want by pressing Ctrl+C
When you restart the download it will continue where it left off.
You can configure the importer by editing the example
settings.yml file that has been copied into the
The settings file comes with sensible defaults, but here are a few tips anyway:
The settings file contains multiple examples on how to split data files:
mbox files usually are separated by a
Fromheader. Choose a regular expression that works for your files.
If each of your files contains only one message, set the
split_regexto an empty string. This also applies to imports from Google Groups.
There’s also an example for files from the popular Listserv mailing list software.
prefer_htmlallows you to configure if the import should use the HTML part of emails when it exists. You should choose what suits you best – it heavily depends on the emails sent to your mailing list.
By default each user imported from the mailing list is created as staged user. You can disable that behaviour by setting
If your emails do not contain a
Message-IDheader (like messages stored by Listserv), you should enable the
Each subdirectory of
/var/discourse/shared/standalone/import/data gets imported as its own category and each directory should contain the data files you want to import. The file names of those do not matter.
import directory should look like this if you want to import two mailing lists with multiple mbox files:
/var/discourse/shared/standalone/import ├── data │ ├── list 1 │ │ ├── foo │ │ ├── bar │ ├── list 2 │ │ ├── 2017-12.mbox │ │ ├── 2018-01.mbox └── settings.yml
Let’s start the import by entering the Docker container and launching the import script inside the Docker container.
/var/discourse/launcher enter import import_mbox.sh # inside the Docker container
Depending on the size of your mailing lists it’s now time for some or
The import script will show you a message like this when it’s finished: Done (00h 26min 52sec)
Tip: You can abort the import anytime you want by pressing Ctrl+C
When you restart the import it will continue where it left off.
You can exit and stop the Docker container after the import has finished.
exit # inside the Docker container /var/discourse/launcher stop import
Let’s start the app container and take a look at the imported data.
/var/discourse/launcher start app
Discourse will start and Sidekiq will begin post-processing all the imported posts. This can take a considerate amount of time. You can watch the progress by logging in as admin and visiting
So, you are satisfied with the result of the import and want to free some disk space? The following commands will delete the Docker container used for importing as well as all the files used during the import.
/var/discourse/launcher destroy import rm /var/discourse/containers/import.yml rm -R /var/discourse/shared/standalone/import
Now it’s time to celebrate and enjoy your new Discourse instance!
You can use an empty tag to remove one or more prefixes from topic titles. The settings file contains an example.
The following steps will reset your Discourse forum to the initial state! You will need to start from scratch.
The following commands will stop the container, delete everything except the mbox files and the importer configuration and restart the container.
cd /var/discourse ./launcher stop app ./launcher stop import rm -r ./shared/standalone/!(import) rm ./shared/standalone/import/data/index.db ./launcher rebuild import ./launcher enter import import_mbox.sh # inside the Docker container
settings.yml and take a look at the
index.db (a SQLite database) before you run the actual import.
You can use SQL to update missing values in the database if you want. That way you don’t need to reindex any messages. The script uses only data from the
index.db during the import phase. Simply disable the
index_only option when you are done and rerun the importer. It will skip the indexing if none of the mbox files were changed, recalculate the content of the
email_order tables and start the actual import process.
You can split mbox files into individual files to make it easier to find offending emails.
apt install procmail; export FILENO=0000; formail -ds sh -c 'cat > split/msg.$FILENO' < mbox;
Create a new directory in the
import/data directory and restart the import script.
You could give this script a try.