Migrate a mailing list to Discourse (mbox, Listserv, Google Groups, etc)

The solution is literally in the post above yours. :wink:

We should fix the script. Maybe you can create a PR that resolves the issue for everyone?

Thanks, that fixed it, sorry about that. In penance for my inability to read I opened Added duplication to name to prevent modification of frozen string exception by adam-skalicky · Pull Request #30325 · discourse/discourse · GitHub to save anyone else the shame of asking a dumb question.

1 Like

Does this import script make Discourse replicate Mailman 2 email threading in any way (eg using the little Discourse arrow to signify "In-Reply-To) or is it purely chronological (for each thread based on Message-ID, In-Reply-To and References)?

1 Like

Yes, it does

3 Likes

Cool. Quite a few of my mailing list emails haven’t got the In-Reply-To and References headers that they should have, so might be imported as new topics rather than just replies. From memory, the script uses those headers or subject headers (not both).

I think I might have asked this in the distant past, but are there any non-manual ways of adding these headers to the MBOX file and/or otherwise rearranging the emails before or after importing to Discourse?

It’s possible now to merge topics and keep chronological order so maybe that’s the answer. They’d just be missing the little Discourse arrow to signify who the message was in reply to.

1 Like

The mbox import script has two phases. The first one is indexing and outputs a SQLite database. You could either modify the data in the SQLite before the import, or you modify the Ruby script.

All the magic of sorting/grouping by subject or headers happens here:

You could add your own logic of grouping if you know how you want to group emails.

3 Likes

It’ll be a while before I even consider something so complex!

At https://bazaar.launchpad.net/~mailman-coders/mailman/2.1/view/head:/Mailman/Archiver/pipermail.py#L669 Mailman 2’s Pipermail seems to look for the following in order of preference:

  1. In-Reply-To.
  2. References.
  3. Oldest email with matching subject.

That combination of approaches seems ideal. In the third case, it might make sense for Discourse not to use the “in reply to” arrow.

From memory, Mailman 3’s Hyperkitty didn’t consider subject at all, which was not as good.

2 Likes

Pardon me chiming in with a possibly stupid question, but I could not find a clear answer here. I would like to know if the import process creates a new Discourse user for each email, with de-duplication of course, or if they all go in as one system user. I have a mailing list with 20 years of posts and it’s pretty big and hard to experiment with. Abd also, what about replies in the original list? Do they get threaded in?

Yes, the users get created, one per email address.

I was able to do a Google Takeout of my google groups, upload the .mbox files and import.

These steps were helpful to map the data/folder to an existing category, but this needs to be done in the import container, not the app container like in this writeup:

./launcher enter import
rails c

# Use the category ID shown in the URL, for example
# it's 16 when the category's path looks like this: /c/soccer/16
category = Category.find(16)

# Use the directory name where the mbox files are stored. For example,
# when the files are stored in import/data/foo, you should use "foo" as directory name.
category.custom_fields["import_id"] = "soccer"
category.save!

I already have users in Discourse that self-migrated, and so the import script failed to create contacts for them (probably not a bad thing), but the imported messages that these existing discourse users were involved with have the sender showing as system instead of their name.

Is there any way to make it map the existing users to their imported messages?

For now I undid everything by recovering from a recent backup. Ready to try again with some guidance on dealing with existing discourse users and their imported messages.

Update:

Claude helped solve the mapping of existing users, need to run this loop in the rails console, in addition to the above bit:

User.where("id > 0").find_each do |u|
  email = u.email.downcase
  unless u.custom_fields["import_id"].present?
    u.custom_fields["import_id"] = email
    u.save_custom_fields
  end
end
1 Like