Migrate a mailing list to Discourse (mbox, Listserv, Google Groups, etc)

saulshanabrook · December 29, 2022, 3:03am

Thank you for providing this guide and import script! I have used it successfully with a google group, using google takeout. I just put the .mbox file in the right directory and ran the script.

I did have a question about importing emails which have parents which are not in the .mbox. For example, there are many threads in our group which are started from a FWD of an email that wasn’t sent to the group, or by adding the group to the reply list in the middle of a conversation to loop them in.

Currently, when importing it seems as if these previous emails are not present. You can find them, if you click on the email icon and view the HTML. I was curious if anyone else encountered this same situation and had any solutions for it. I could imagine either including the previous email chain in the post or trying to parse it and extract a number of messages from it and adding all of those.

pfaffman · December 29, 2022, 11:19am

You would need to find a way to generate those messages from the quoted text and add them to the mbox file (probably with Id headers) before running the import script.

Andro · March 1, 2023, 5:27am

This is really excellent. But I have some issues with some emails coming into Discourse with an initial email and then the mbox format replies in the same post, not formatted. I’m not sure what is causing this.

The question is, how can I delete all the imported mails (20 years worth) without deleting and recreating the target discourse instance?

Andro · March 1, 2023, 8:27am

I’m aware the recommended RAM requirement is 8GB. I did try importing 20 years of posts on a 2GB virtual machine and it ran for a while and crashed with the message ‘killed’. 8GB machines on hosting providers such as DigitalOcean are expensive (for me). Is there any way to do this with less memory? Import in smaller batches perhaps?

pfaffman · March 2, 2023, 1:38am

Maybe delete those categories and then delete the associated topic custom fields.

No, I don’t think you can do much of an import on a small machine. You could try on a desktop but then you’ve got bandwidth issues to get the database back to the internet.

Andro · March 10, 2023, 12:22pm

I know there is not much activity on this thread, but I can’t succeed in getting it working properly. Many of the mbox format emails I import are not split properly. The From lines look like this:

From MAILER-DAEMON Tue Nov 01 05:57:09 2022

But some messages have a correct import then in the same body have raw mbox format items starting with the typical From line. In other words, they are not being split. I don’t see that I need to modify the regex that does the splitting and I don’t know ruby so I can’t debug the import script.

I don’t know where to go from here. There’s 20 years of messages to import, so I can’t go through the imported messages by hand to fix them up. In short this script is not working for me. Why would I be the only one this happens to?

pfaffman · March 10, 2023, 12:44pm

You’re not. My first paid discourse job was months cleaning up old mbox files that had be hand edited for some reason that I can’t recall.

It sounds like you do need to muck with the regex or find some other way to fix the errant messages. One way is to use some other tool to split the messages into one per file.

Fwiw, I wrote several import scripts before I knew ruby.

Every import is unique. With 20 years of data, it’s a good bet that you’ll have several different issues as things changed in the various systems that were used.

Andro · March 11, 2023, 3:54am

You bet. That’s for sure.

MikeNolan · March 15, 2023, 1:11am

I want to import 20 years of messages from my mailman2 system into an archive directory, but I don’t want to create user IDs (not even staged ones) for them, as many of our subscribers have moved on or passed on and it would create many accounts that will just take up space.

Can I import them all under the same user ID (perhaps ‘archive’)?

And this may be a dumb question, but since the app is turned off during the import process, does that mean users who have signed up for emails about new posts won’t get flooded with emails about all the archives that were just loaded?

pfaffman · March 15, 2023, 12:16pm

You can comment out the import_users function and all messages will be owned by system.

You’re not going to save much space.

No users will receive email until they have used the forgot password process to log in to their account. If you’re importing these data into an existing community then I believe that users will get notifications about the new messsages that are created by the import script.

MikeNolan · March 15, 2023, 5:11pm

Thanks, I was looking through the import script and figured that I might be able to just disable the new user section. Testing that is on my list.

It isn’t file space I’m thinking about, it’s having possibly hundreds of staged user accounts that will never be used, so it’s more like head space or a very long user list.

pfaffman · March 15, 2023, 6:52pm

You know your users but having accounts that no one will use seems much better than not knowing who posted 20 years worth of messages.

MikeNolan · March 18, 2023, 7:00pm

That’s a valid point, Jay.

I’m not finding the import_mbox.sh file and when I try executing the mbox.rb script directly, I get a bunch of Ruby errors:

root@lists-import:/var/www/discourse/script/import_scripts# ruby mbox.rb mbox
fatal: detected dubious ownership in repository at ‘/var/www/discourse’
To add an exception for this directory, call:

    git config --global --add safe.directory /var/www/discourse

/var/www/discourse/vendor/bundle/ruby/3.2.0/gems/zeitwerk-2.6.7/lib/zeitwerk/loader/callbacks.rb:25:in `on_file_autoloaded’: expected file /var/www/discourse/lib/freedom_patches/pluck_first.rb to define constant FreedomPatches::PluckFirst, but didn’t (Zeitwerk::NameError)

  raise Zeitwerk::NameError.new(msg, cref.last)

Adam_Monago · March 22, 2023, 3:30pm

Greetings folks. What a great guide. Thank you to Gerhard and others for contributing.

Has anyone here adapted this for Lyris? I’m interested in migrating a historic install and would like to understand if there were any special concerns they hit in a similar project.

tpokorra · August 14, 2023, 4:53pm

I needed to import posts from a mailing list to Discourse, and ran into two problems.

sqlite3 was not found.
I could not find import_mbox.sh

Here are my solutions:

install sqlite3

I added to Gemfile:

 gem "sqlite3", "~> 1.3", ">= 1.3.13"

then run:

cd discourse
bundle config set frozen false
bundler install

run the import

cd discourse
RAILS_ENV=production bundle exec rails runner script/import_scripts/mbox.rb script/import_scripts/mbox/settings.yml

gerhard · August 14, 2023, 6:27pm

You probably missed the following step which is hidden behind “Regular import” in 1.2. Preparing the Docker container.

Michael_Sandler · July 1, 2024, 7:52pm

I’m getting this can't modify frozen String error. Can anyone suggest a fix or work out what I’m doing wrong?

root@sajcf:~# /var/discourse/launcher stop app
x86_64 arch detected.
+ /usr/bin/docker stop -t 600 app
app
root@sajcf:~# /var/discourse/launcher enter import
x86_64 arch detected.
root@sajcf-import:/var/www/discourse# import_mbox.sh
The mbox import is starting...

Loading existing groups...
Loading existing users...
Loading existing categories...
Loading existing posts...
Loading existing topics...

creating index
indexing files in /shared/import/data/jjcf
indexing /shared/import/data/jjcf/SAJCF.mbox

indexing replies and users

creating categories
/var/www/discourse/script/import_scripts/base.rb:447:in `strip!': can't modify frozen String: "jjcf" (FrozenError)
        from /var/www/discourse/script/import_scripts/base.rb:447:in `block in create_categories'
        from /var/www/discourse/script/import_scripts/base.rb:438:in `each'
        from /var/www/discourse/script/import_scripts/base.rb:438:in `create_categories'
        from /var/www/discourse/script/import_scripts/mbox/importer.rb:50:in `import_categories'
        from /var/www/discourse/script/import_scripts/mbox/importer.rb:34:in `execute'
        from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'
        from script/import_scripts/mbox.rb:13:in `<module:Mbox>'
        from script/import_scripts/mbox.rb:11:in `<module:ImportScripts>'
        from script/import_scripts/mbox.rb:10:in `<main>'

pfaffman · July 2, 2024, 1:08pm

You can Google how to solve that. I think a .dup might be an easy way.

User154574 · August 12, 2024, 1:57pm

to be more specific, I have successfully modified the code which allowed import, adding .dup in line 447 of file /var/www/discourse/script/import_scripts/base.rb:

params[:name].dup.strip!

One thing is not clear: how can I import into one of multisite sites?

Adam_Skalicky · December 17, 2024, 6:04am

Has anyone gotten a “can’t modify frozen String” error? My index.db is created fine but it fails on creating categories.

root@xxxxxxxxxx:/var/www/discourse# import_mbox.sh

The mbox import is starting...

Loading existing groups...

Loading existing users...

Loading existing categories...

Loading existing posts...

Loading existing topics...

creating index

indexing files in /shared/import/data/xxxxx-xxxxxxx@xxxxxxx.com

indexing /shared/import/data/xxxxx-xxxxxxx@xxxxxxx.com/export.mbox

indexing replies and users

creating categories

/var/www/discourse/script/import_scripts/base.rb:447:in `strip!': **can't modify frozen String: "xxxxx-xxxxxxx@xxxxxxx.com" (****FrozenError****)**

from /var/www/discourse/script/import_scripts/base.rb:447:in `block in create_categories'

from /var/www/discourse/script/import_scripts/base.rb:438:in `each'

from /var/www/discourse/script/import_scripts/base.rb:438:in `create_categories'

from /var/www/discourse/script/import_scripts/mbox/importer.rb:50:in `import_categories'

from /var/www/discourse/script/import_scripts/mbox/importer.rb:34:in `execute'

from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'

from script/import_scripts/mbox.rb:13:in `<module:Mbox>'

from script/import_scripts/mbox.rb:11:in `<module:ImportScripts>'

from script/import_scripts/mbox.rb:10:in `<main>'

Topic		Replies	Views
Yahoo Groups Importation Errors Migration	7	1342	January 18, 2020
Migration from Yahoo! Groups Migration	25	6294	November 19, 2023
Use an import script that requires MySQL Sysadmins how-to	51	12543	December 12, 2023
Migrate a Vanilla forum to Discourse Sysadmins how-to	44	15759	January 30, 2023
Troubleshoot email on a new Discourse install Self-Hosting email , configuring , how-to	25	175208	February 13, 2025

Migrate a mailing list to Discourse (mbox, Listserv, Google Groups, etc)

install sqlite3

run the import

Related topics