Improving Mailman email parsing

supermathie · January 26, 2023, 10:50pm

We’ve noticed on a couple forums that use Discourse to mirror a public mailing list some posts are getting attributed to the wrong user:

image842×671 44.4 KB

from: [ruby-talk:444110] exif - photo metadata - ruby-talk - Ruby Mailing List Mirror

In this case, Discourse first staged a user with the name “Austin Ziegler via ruby-talk” with an email address matching the list submission address and that’s what shows up for every post like this.

image1160×959 109 KB

from: txt.att.net outage? - #4 by Mailman - Mailman List Mirror (Read Only) - NANOG

In this case, Discourse first staged a user with the name “Mailman” with an email address matching the list submission address.

Upon investigation, our mail parsing is sometimes incorrect. The cause is that for DMARC compliance, Mailman will sometimes change the From header to itself and put the original sender into the reply-to:

To: Ryan Davis via ruby-talk 
X-MailFrom: tom@tomsdomain.com
X-Mailman-Version: 3.3.3
Reply-To: Ruby users <ruby-talk@ml.ruby-lang.org>
From: Tom Reilly via ruby-talk <ruby-talk@ml.ruby-lang.org>
Cc: Tom Reilly <tom@tomsdomain.com>

To: Jared Mauch <jared@jaredsdomain.com>
X-BeenThere: nanog@nanog.org
X-Mailman-Version: 2.1.39
From: Owen DeLong via NANOG <nanog@nanog.org>
Reply-To: Owen DeLong <owen@owensdomain.com>
Cc: nanog <nanog@nanog.org>

but leave it when it doesn’t need to change:

To: Jon Lewis <jlewis@jonsdomain.org>
X-BeenThere: nanog@nanog.org
X-Mailman-Version: 2.1.39
From: William Herrin <bill@billsdomain.us>
Cc: nanog@nanog.org

Seems there’s a lot of different options for behaviour here, so we’d like to come up with an algorithm to properly parse what Mailman sends out in every single case.

There’s potentially other options, for instance Mailman could post the unchanged message directly to a Discourse instance, but those are more complex to set up and may not be available to everyone.

Here’s the start of one:

if mailman-version < 3
- if any of:
  - From address matches List-Id
  - From address matches List-Post
  - From address matches X-BeenThere
- then use Reply-To as From
if mailman-version >= 3
- if X-MailFrom exists
  - Use name from From header, stripping /via .*/
  - Use email from X-MailFrom

Also, when all this is said and done, is it possible to have a rake task re-process existing posts (probably only the ones matching the erroneous user) with this new logic?

zogstrip · May 17, 2023, 11:29pm

github.com/discourse/discourse

FIX: improve mailman email parsing

discourse:main ← discourse:improve-mailman-email-parsing

opened 11:13PM - 17 May 23 UTC

ZogStriP

+226 -101

https://meta.discourse.org/t/improving-mailman-email-parsing/253041 When mirr…oring a public mailling list which uses mailman, there were some cases where the incoming email was not associated to the proper user. As it happens, for various (undertermined) reasons, the email from the sender is often not in the `From` header but can be in any of the following headers: `Reply-To`, `CC`, `X-Original-From`, `X-MailFrom`. It might be in other headers as well, but those were the ones we found the most reliable. There's also a new `emails:fix_mailman_users` rake task to fix wrongfully associated users.

The gist of it is that I’ve come up with an algorithm that works for all versions (I’ve seen in the wild).

Get the mailing list email address from either List-Post or X-BeenThere header
The mail of the sender will be in any of the following headers: From, Reply-To, X-MailFrom or X-Original-From. So iterate over those and return the first that doesn’t match the email address of the mailing list.

gerhard · May 22, 2023, 9:24am

This seems to work great!
I used rake emails:fix_mailman_users to fix all posts that were attributed to the wrong user on https://rubytalk.org/

gerhard · May 24, 2023, 9:25am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Working on a mailman2 to discourse migration script Support	12	1116	June 12, 2021
The mailing list mode email address book trap Support	6	789	December 9, 2021
Confusion between Reply To and Reply List in email responses UX	45	5849	October 13, 2021
Mail precedence header set to list Support	3	988	February 15, 2021
Confused by Error Message From Reply-by-Email Support	26	3004	June 4, 2020

Improving Mailman email parsing

Related topics