Migrate a mailing list to Discourse (mbox, Listserv, Google Groups, etc)

The solution is literally in the post above yours. :wink:

We should fix the script. Maybe you can create a PR that resolves the issue for everyone?

Thanks, that fixed it, sorry about that. In penance for my inability to read I opened Added duplication to name to prevent modification of frozen string exception by adam-skalicky · Pull Request #30325 · discourse/discourse · GitHub to save anyone else the shame of asking a dumb question.

1 Like

Does this import script make Discourse replicate Mailman 2 email threading in any way (eg using the little Discourse arrow to signify "In-Reply-To) or is it purely chronological (for each thread based on Message-ID, In-Reply-To and References)?

1 Like

Yes, it does

3 Likes

Cool. Quite a few of my mailing list emails haven’t got the In-Reply-To and References headers that they should have, so might be imported as new topics rather than just replies. From memory, the script uses those headers or subject headers (not both).

I think I might have asked this in the distant past, but are there any non-manual ways of adding these headers to the MBOX file and/or otherwise rearranging the emails before or after importing to Discourse?

It’s possible now to merge topics and keep chronological order so maybe that’s the answer. They’d just be missing the little Discourse arrow to signify who the message was in reply to.

1 Like

The mbox import script has two phases. The first one is indexing and outputs a SQLite database. You could either modify the data in the SQLite before the import, or you modify the Ruby script.

All the magic of sorting/grouping by subject or headers happens here:

You could add your own logic of grouping if you know how you want to group emails.

3 Likes

It’ll be a while before I even consider something so complex!

At https://bazaar.launchpad.net/~mailman-coders/mailman/2.1/view/head:/Mailman/Archiver/pipermail.py#L669 Mailman 2’s Pipermail seems to look for the following in order of preference:

  1. In-Reply-To.
  2. References.
  3. Oldest email with matching subject.

That combination of approaches seems ideal. In the third case, it might make sense for Discourse not to use the “in reply to” arrow.

From memory, Mailman 3’s Hyperkitty didn’t consider subject at all, which was not as good.

2 Likes

Pardon me chiming in with a possibly stupid question, but I could not find a clear answer here. I would like to know if the import process creates a new Discourse user for each email, with de-duplication of course, or if they all go in as one system user. I have a mailing list with 20 years of posts and it’s pretty big and hard to experiment with. Abd also, what about replies in the original list? Do they get threaded in?

Yes, the users get created, one per email address.

Hello folks,

I’m trying to move from Google Groups. When trying to download the messages using this command script/import_scripts/google_groups.rb -g <group_name> -d <domain_name> in get a stacktrace right away:

Fetching gem metadata from https://rubygems.org/.......
Resolving dependencies...
/usr/local/lib/ruby/gems/3.4.0/gems/childprocess-4.1.0/lib/childprocess.rb:6: warning: logger was loaded from the standard library, but will no longer be part of the default gems starting from Ruby 3.5.0.
You can add logger to your Gemfile or gemspec to silence this warning.
/usr/local/lib/ruby/gems/3.4.0/gems/selenium-webdriver-4.1.0/lib/selenium/webdriver/common/zipper.rb:23: warning: base64 was loaded from the standard library, but is not part of the default gems starting from Ruby 3.4.0.
You can add base64 to your Gemfile or gemspec to silence this warning.
/usr/local/lib/ruby/gems/3.4.0/gems/bundler-2.6.4/lib/bundler/runtime.rb:71:in 'block (2 levels) in Bundler::Runtime#require': There was an error while trying to load the gem 'webdrivers'. (Bundler::GemRequireError)
Gem Load Error is: cannot load such file -- base64

I can add a gem “base64" to the google_groups.rb script to move a bit further, but then I see this:

Logging in...
/usr/local/lib/ruby/gems/3.4.0/gems/rubyzip-3.2.2/lib/zip/entry.rb:757:in 'File#initialize': No such file or directory @ rb_sysopen - /root/.webdrivers/root/.webdrivers/chromedriver (Errno::ENOENT)

Any suggestions on how I can move forward with downloading the messages?

Edit: is there maybe a newer version of this script? Groups no longer use the /forum in their URL, it’s now the /g/ notation and other changes.