Apologies in advance for some of the tone below. I sound exasperated,
because I am a little exasperated.
By Michael Brown via Discourse Meta at 27Jul2022 14:06:
Sorry, I’m just catching up now, here are some thoughts, some of which
have already been addressed…The difficulty here is that what is sent out from Discourse is a different message than the inbound. It has different metadata (for this purpose, To/From/Reply-to/Unsubscribe/etc.) and a different body (it’s customised per user (I think? Does this not happen in mailling list mode?)).
What exactly is the message? Treating 5322 as gospel:
A message consists of header fields, optionally followed by a message body.
The “Message-ID:” field provides a unique message identifier that refers to a particular version of a particular message.
[emphasis mine]It’s that “particular version” that makes me think it would be inappropriate to re-send an incoming message with a different Message-ID. Though, if you change your point of view from Discourse as “Forum Software” to Discourse being “Mailing List Software” then it kind of makes sense to do so, so I get where you’re coming from.
Well, unfortunately this depends on an overly literal reading, maybe
reading conext which isn’t there.
Every email messages gets its headers modified as mail system pass it
along. If nothing else, Received:
headers get added at every step, and
several systems add various headers indicating spam filtering results
and signatures. None of those trigger a message-id modification, and
indeed doing so would make the message-id totally dysfunctional.
Regarding content, as already mentioned, almost every mailing list adds
content to the body text, usually a footer with a link to the list admin
page or an unsubscribe link. There also do not trigger a message-id
change.
In fact, almost nothing which forwards a message changes the message-id.
Because that would break threading and duplicate detection for end user
clients.
I see you go on to quote what I was just about to cite
5322 also says:
There are many instances when messages are “changed”, but those changes do
not constitute a new instantiation of that message, and therefore the message
would not get a new message identifier. For example, when messages are
introduced into the transport system, they are often prepended with
additional header fields such as trace fields (described in section 3.6.7)
and resent fields (described in section 3.6.6). The addition of such header
fields does not change the identity of the message and therefore the original
“Message-ID:” field is retained. In all cases, it is the meaning that the
sender of the message wishes to convey (i.e., whether this is the same
message or a different message) that determines whether or not the
“Message-ID:” field changes, not any particular syntactic difference that
appears (or does not appear) in the message.I suppose it comes down to, does the sender of the message change when Discourse sends it out?
I think you’ve misread things here. Let me emphasise:
In all cases, it is the meaning that the sender of the message
wishes to convey (i.e., whether this is the same message or a
different message) that determines whether or not the "Message-ID:"
field changes
The sender is the author, not an MTA such as Discourse.
If I post to Discourse via email, I want my message to reach the readers
as it is, semanticly speaking. Any riders like unsub links do not change
the semantics of what I have said in my message.
It’s still the same message.
Maybe we should use Resent-Message-ID and friends?
Absolutely not. They are for a user resubmitting a message. For
example, if I forwarded a message on to someone else. They’re not for
mail relays (such as lists and Discourse).
It’s always been there, all the back to 822. But as you say later, yes it’s been updated.
Ouch. I thought it was USENET only at that point. I stand corrected.
5322 also speaks directly to the way Discourse and Github use it:
The “In-Reply-To:” field may be used to identify the message (or messages) to
which the new message is a reply, while the “References:” field may be used to
identify a “thread” of conversation.Possibly slightly improperly, likely due to the lack of a suitable “Thread Identifier” header. But this interpretation may not be what the RFC authors intended… it doesn’t address messages with a “References” but without “In-Reply-To”.
It says to me that the two fields cover the same information:
References
shows a linear (usually) thread back to the OPIn-Reply-To
shows the parent, and implies the same thread in
aggregate with the previous messages back to the OP
The tricky bit of this is that we aren’t sending out one email, we’re sending out N - one per recipient - so that their individual metadata (Unsubscribe, etc.) can be correct.
This isn’t tricky. The meaning of the messages is the same, the
customisations are minor and semanticly irrelevant. They do not
warrant new or distinct message-ids.
And yes, I did see strong indications during testing that spam determination would be tied to a Message-ID. If it was later seen again (same user or different user) it would be much more likely to be marked spam.
Can you show some of these instances. Because message-ids allow
deduplication are the end user’s end. And bear in mind that many
“antispam” measures are misguides rubbish. The number of things I’ve had
rejected as potential spam for utterly spurious reasons… breaking
email to work around broken spam misfiltering is a poor choice.
To this day I never CC people with GMail addresses because GMail’s spam
filtering knows me and drops things on the floor. If I send only to the
list, they get it. If I CC ther GMail address it (a) marks it as spam
and (b) then also marks the mailing list message as spam as well (same
message-id!) The end user doesn’t see my message. This logic is utterly
spurious and unrepairable.
[quote=“Cameron Simpson, post:22, topic:233499,
username:cameron-simpson”]
So I’d be entirely ok with you adding your recipient-specific unsub link and preserving the original message-id. The benefits far far outweigh the loss of threading if you gave each message copy an individual message-id.
[/quote]The benefits here, to be fair, are entirely around threading the emails correctly in certain mail clients at the expense of deliverability.
Sigh. To all email clients. And a major reason people over in
Pythonland are saying they will just not go to Discourse is that the
email side threading is broken. Many people do not use forums, because
each forum requires them to visit it. Email comes to them, they get to
use their preferred reader and their preferred editor, and threading
lets people see the discussion flow clearly. When it works.
The current
topic/#{topic_id}/#{post_id}.s#{sender_user_id}r#{receiver_user_id}
at least makes it consistent for a user in their mailbox. The assumptionMy biggest concern is the deliverability - it’s hard enough to get email delivered when there is zero visibility from the major providers.
I would like to see evidence. Mailing lists do this correctly all over
the planet. Discourse definitely and objectively breaks this. I’m trying
to get it fixed.
Let me reiterate the two basic problems here:
- The OP
In-Reply-To
andReferences
cite a fictitious “pre-OP”
“topic” message-id, so no email user has a threwad with a starting
message (the OP) - everything including the OP looks like a followup - The emails received via Discourse and the emails received directly eg
via CC have different message-ids even though they are the same
message semanticly speaking; this breaks threading and deduplication
But I do see a strong argument for making Discourse behave more like mailing list software in mailing list mode. @martin I believe we don’t customise the message body in mailing list mode? Do you think it makes sense to take a more strict approach around preserving and reusing Message-IDs in mailing list mode?
Their are people over in Pythonland who found “mailing list mode” too
much of a firehose. They want to get email for targeted topics but not
everything. The message-id handling should e the same for all of the
email side.
I’m a “mailing list mode” person on discuss.python.org. But I turned it
on here (discourse.org) and _immediately turned it off again. I need
targeted mode over here.