Discourse email messages are incorrectly threaded

supermathie · July 29, 2022, 3:22am

It opens the door to all sorts of things we could potentially do, such as rejecting mail replying to an old version of a message. But it’s imagineering and things that may sound like a good idea but aren’t necessarily

riking · August 2, 2022, 4:07am

I haven’t seen this mentioned yet, so make sure you handle it:

If the email includes a “Previous Replies” section, (do something to make it recipient specific)

martin · August 2, 2022, 4:13am

I’m not sure what you mean by previous replies, and how I would need to make it more recipient specific? How does it fit in with the broader strategy of consistent Message-IDs we have detailed here?

riking · August 2, 2022, 4:16am

If someone receives a “Previous Replies” section, you need to entirely give up on consistent message IDs: you’re combining multiple posts into a single email! There is no correct single message ID that identifies all the contents consistently across multiple viewpoints.

Edit: actually you can just concatenate them I guess?
topic/1234/post/12345.also-12340-12339

Additionally, this makes the spam triggers mentioned earlier even more severe: you’re not just switching the unsubscribe link under the same message ID, there are actual words included and excluded in different versions delivered with the same message ID.

cameron-simpson · August 2, 2022, 4:39am

By Kane York via Discourse Meta at 02Aug2022 04:26:

If someone receives a “Previous Replies” section, you need to entirely
give up on consistent message IDs: you’re combining multiple posts into
a single email!

Can you elaborate on this a bit? What’s the email look like in this
case? Got an example?

There is no correct single message ID that identifies all the contents
consistently across multiple viewpoints.

It is enough to pick just one, such as the first, for most purposes. if
this example is still a reply.

Edit: actually you can just concatenate them I guess?
topic/1234/post/12345.also-12340-12339

I’m beginning to think you’re conflating some kind of reference to
Discourse forum source posts with the message-id. The message-id
identifies email messages. For a reply post, the message-id in the
In-Reply-To and References should match the message-id of the
relevant antecedant email messages. When someone makes a post to a
topic, every copy of that post sent via email to those requesting email
copies should have the same message-id.

Additionally, this makes the spam triggers mentioned earlier even more severe: you’re not just switching the unsubscribe link under the same message ID, there are actual words included and excluded in different versions delivered with the same message ID.

There’s nothing wrong with that, if these are “administrative footers”
at the bottom of the post.

I’m not entirely sure we’re talking about the same thing.

Also: what specific spam triggers are you discussing here? Because
“slightly differing” email messages go out all the time in the real
world.

Cheers,
Cameron Simpson cs@cskk.id.au

sam · August 2, 2022, 5:19am

Kane is talking about this option:

screenshot of "include previous replies" section in user profile (always / unless prev sent / never)

Users can opt to get previous replies with large amounts of fidelity appended to the end of mails.

I am happy to chalk this as background radiation for now and treat it similar to how we treat the unique unsubscribe links.

I certainly don’t want to block any more progress here now that we have consensus.

(as to how it looks, imo it looks quite confusing, but some users like it)

cameron-simpson · August 2, 2022, 5:41am

By Sam Saffron via Discourse Meta at 02Aug2022 05:29:

[quote=“Cameron Simpson, post:51, topic:233499,
username:cameron-simpson”]
Can you elaborate on this a bit? What’s the email look like in this
case? Got an example?
[/quote]

Kane is talking about this option:

Users can opt to get previous replies with large amounts of fidelity
appended to the end of mails.

Ah, thank you.

I am happy to chalk this as background radiation for now and treat it similar to how we treat the unique unsubscribe links.

That would be my inclination too. To my mind it fits with “the original
intent of the poster/author” objective from the RFC. Accompanying
baggage, but not a different core message.

Cheers,
Cameron Simpson cs@cskk.id.au

martin · August 2, 2022, 6:56am

Agreed, let’s proceed as planned. I began looking into the changes yesterday, starting with our email receiver.

@cameron-simpson just as an FYI, I am weaving this in with other responsibilities I currently have. I will be away on holiday next week and then again in the first two weeks of September, so it may take a little while to show real forward progress here. Please be assured I am keeping this top of mind and I will make sure to write regular updates here, thanks a lot for your participation and contributions so far!

cameron-simpson · August 2, 2022, 7:46am

By Martin Brennan via Discourse Meta at 02Aug2022 07:06:

@cameron-simpson just as an FYI, I am weaving this in with other
responsibilities I currently have. I will be away on holiday next week
and then again in the first two weeks of September, so it may take a
little while to show real forward progress here. Please be assured I am
keeping this top of mind and I will make sure to write regular updates
here, thanks a lot for your participation and contributions so far!

Thank you,
Cameron Simpson cs@cskk.id.au

martin · August 19, 2022, 3:38am

@cameron-simpson I have started back work on this again after returning from my trip, and I just wanted to get some further clarification about different References and In-Reply-To scenarios.

Scenario 1: When creating a post inside Discourse that does not directly reply to another post, do we simply use the topic OP’s Message-ID which we have stored in the new outbound_message_id column for both References and In-Reply-To?

Scenario 2: When a post replies to multiple other posts at once (which can happen via quotes), which post do we use for In-Reply-To? And do we use all of them for References or just the single one chosen for In-Reply-To? Do we include the OP post’s Message-ID in References at all?

Scenario 3: Similar to the above, but let’s keep it to a single post that we are replying to. If we are replying to post B which in turn is a reply to post A, does In-Reply-To simply point to post B and then references should be in the order of post A, post B (of course always refering to Message-IDs via outbound_message_id on the post)? Do we keep going up the reply chain or just stop at the first parent for References?

This mainly comes down to how we are interpreting this quote from the RFC, and mainly affects References – do they just get limited to direct replies via quote or otherwise or do they also always include the OP.

Note: Some implementations parse the “References:” field to
display the “thread of the discussion”. These implementations
assume that each new message is a reply to a single parent and
hence that they can walk backwards through the “References:” field
to find the parent of each message listed there. Therefore,
trying to form a “References:” field for a reply that has multiple
parents is discouraged; how to do so is not defined in this
document.

Thanks Cameron, I will go ahead with what I think is correct in the meantime and tweak based on your reply.

cameron-simpson · August 19, 2022, 4:22am

By Martin Brennan via Discourse Meta at 19Aug2022 03:48:

@cameron-simpson I have started back work on this again after returning
from my trip, and I just wanted to get some further clarification about
different References and In-Reply-To scenarios.

Scenario 1: When creating a post inside Discourse that does not directly reply to another post, do we simply use the topic OP’s Message-ID which we have stored in the new outbound_message_id column for both References and In-Reply-To?

I may be having terminology issues here. For a new topic’s OP I’d expect
no References or In-Reply-To, being the OP.

For a post in an existing topic not citing a specific previous post,
which I think is what you’re actually describing, the OP’s Message-ID
alone in each of References and In-Reply-To, exactly as you
describe.

Scenario 2: When a post replies to multiple other posts at once (which can happen via quotes), which post do we use for In-Reply-To?

[Rereading RFC 5322 again…]

In-Reply-To should contain the Message-IDs of each of the posts to
which it replies.
References should be the parent(*)'s References with the parent
Message-ID appended.

So the In-Reply-To goes back just one message in the thread. Multiple
parent posts means multiple message-ids, but they should just be the
message-ids of the immediate parent posts.

The References is meant to trace the whole chain of replies from the
OP to this post’s parent(*). So it is computed as the thread to the
parent, plus the parent’s message-id.

(*) When there’s more than one parent: The RFC says that because clients
(email readers) often expect the References to trace a single thread
of replies from the OP to the post, the RFC explicitly discourages
merging all of the parents’ References. Instead you should pick just
one. My personal inclination would be the first cited parent post, but
that’s clearly a policy decision: choose whichever you think most useful
perhaps.

Scenario 3: Similar to the above, but let’s keep it to a single
post that we are replying to. If we are replying to post B which in
turn is a reply to post A, does In-Reply-To simply point to post B
and then references should be in the order of post A, post B (of
course always refering to Message-IDs via outbound_message_id on
the post)? Do we keep going up the reply chain or just stop at the
first parent for References?

The In-Reply-To goes back exactly one layer. So in this scenario it
contains only the parent post’s Message-ID.

The References is a chain from the OP to this message.

This mainly comes down to how we are interpreting this quote from the RFC, and mainly affects References – do they just get limited to direct replies via quote or otherwise or do they also always include the OP.

They should always start at the OP. If all the preceeding posts have
done this, you can just glom the parent message-id onto the parent’s
References and get the whole chain for free.

If you’re dealing with a “legacy” message you could walk back up the
tree. Or you could decide to do that every time anyway. Or you could say
things are good going forward, we’ll just grab the References from the
parent if it’s there. Depends what you’re deciding to store in your db I
think.

I think as long as you’re aiming for In-Reply-To being the immediate
parentage and References being a line back to the OP, you should be
good.

Cheers,
Cameron Simpson cs@cskk.id.au

martin · August 19, 2022, 4:29am

Yes this is what I was describing, thanks for that.

Cameron Simpson:

So the In-Reply-To goes back just one message in the thread. Multiple
parent posts means multiple message-ids, but they should just be the
message-ids of the immediate parent posts.

The References is meant to trace the whole chain of replies from the
OP to this post’s parent(*). So it is computed as the thread to the
parent, plus the parent’s message-id.

(*) When there’s more than one parent: The RFC says that because clients
(email readers) often expect the References to trace a single thread
of replies from the OP to the post, the RFC explicitly discourages
merging all of the parents’ References. Instead you should pick just
one. My personal inclination would be the first cited parent post, but
that’s clearly a policy decision: choose whichever you think most useful
perhaps.

Thank you, so essentially the answer is just – pick a single post that was quoted to use that as the parent (whether it’s first cited or the most recently created post, just important to pick one), use that for In-Reply-To, and use it and its parents all the way back to the OP for References.

Got it, makes sense.

I think this clarifies everything and your answers were what I was expecting, I wanted to just double-check before beginning manual testing. Thanks for the speedy reply

martin · August 22, 2022, 5:25am

@cameron-simpson I think I’ve got this working as described, I have mutt set up and things seem to be threading correctly (though I am not sure why the subject line is omitted in the thread, and also not sure how to set it up to see my own sent replies inline within the thread):

And here is how Thunderbird represents the same thread (actually I realised Thunderbird does not show my replies inline either):

Here is what it looks like in Gmail:

Headers are below, the post IDs start at 91, so post 1 == post ID 91.

Post/Email 1

(No References or In-Reply-To since it’s the first email)

From: Martin Brennan via The Email Threading Sandbox <notifications@cdckmartintesting.discoursemail.com>
Reply-To: The Email Threading Sandbox <incoming+3706c086cd36c6e37550c24f4e25c9b8@cdckmartintesting.discoursemail.com>
To: imaptest2@discourse.org
Message-ID: <discourse/post/91@discoursehosted.martin-brennan.com>
Subject: [The Email Threading Sandbox] [Royal Court] Threading topic 1 for

Post/Email 2

From: Bizarro Martin via The Email Threading Sandbox <notifications@cdckmartintesting.discoursemail.com>
Reply-To: The Email Threading Sandbox <incoming+9ea955b74a04dc85f5504ad245636824@cdckmartintesting.discoursemail.com>
To: imaptest2@discourse.org
Message-ID: <discourse/post/92@discoursehosted.martin-brennan.com>
In-Reply-To: <discourse/post/91@discoursehosted.martin-brennan.com>
References: <discourse/post/91@discoursehosted.martin-brennan.com>
Subject: [The Email Threading Sandbox] [Royal Court] Threading topic 1 for

Post/Email 3

From: Martin Brennan via The Email Threading Sandbox <notifications@cdckmartintesting.discoursemail.com>
Reply-To: The Email Threading Sandbox <incoming+410877b7f868b59945f3e3ea16570fc4@cdckmartintesting.discoursemail.com>
To: imaptest2@discourse.org
Message-ID: <discourse/post/93@discoursehosted.martin-brennan.com>
In-Reply-To: <discourse/post/91@discoursehosted.martin-brennan.com>
References: <discourse/post/91@discoursehosted.martin-brennan.com>

Post/Email 4

Replies to Post 2 and Post 3, but we use Post 3 as the parent for the references chain since we need to only choose one.

Date: Mon, 22 Aug 2022 04:05:45 +0000
From: Bizarro Martin via The Email Threading Sandbox <notifications@cdckmartintesting.discoursemail.com>
Reply-To: The Email Threading Sandbox <incoming+0a63eba3765f58e709a2ca538ca2b926@cdckmartintesting.discoursemail.com>
To: imaptest2@discourse.org
Message-ID: <discourse/post/94@discoursehosted.martin-brennan.com>
In-Reply-To: <discourse/post/93@discoursehosted.martin-brennan.com>
References: <discourse/post/91@discoursehosted.martin-brennan.com>
 <discourse/post/93@discoursehosted.martin-brennan.com>
Subject: [The Email Threading Sandbox] [Royal Court] Threading topic 1 for

Post/Email 5

This replies directly to Post 4, which in turn replies directly to Post 3.

Date: Mon, 22 Aug 2022 05:05:06 +0000
From: Martin Brennan via The Email Threading Sandbox <notifications@cdckmartintesting.discoursemail.com>
Reply-To: The Email Threading Sandbox <incoming+d66f675a0ce64fcaa2ba6b91e3112b05@cdckmartintesting.discoursemail.com>
To: imaptest2@discourse.org
Message-ID: <discourse/post/95@discoursehosted.martin-brennan.com>
In-Reply-To: <discourse/post/94@discoursehosted.martin-brennan.com>
References: <discourse/post/91@discoursehosted.martin-brennan.com>
 <discourse/post/93@discoursehosted.martin-brennan.com>
 <discourse/post/94@discoursehosted.martin-brennan.com>
Subject: [The Email Threading Sandbox] [Royal Court] Threading topic 1 for

Post/Email 6

Reply to a reply I sent in via email, note that I have maintained the (odd) Message-ID generated by Thunderbird 12d1ec8f-859c-2339-2c7d-9cb3310756a2@discourse.org.

Date: Mon, 22 Aug 2022 05:16:31 +0000
From: Martin Brennan via The Email Threading Sandbox <notifications@cdckmartintesting.discoursemail.com>
Reply-To: The Email Threading Sandbox <incoming+fb424977c7bd0c8146bdd7302dc35933@cdckmartintesting.discoursemail.com>
To: imaptest2@discourse.org
Message-ID: <discourse/post/97@discoursehosted.martin-brennan.com>
In-Reply-To: <12d1ec8f-859c-2339-2c7d-9cb3310756a2@discourse.org>
References: <discourse/post/91@discoursehosted.martin-brennan.com>
 <discourse/post/93@discoursehosted.martin-brennan.com>
 <discourse/post/94@discoursehosted.martin-brennan.com>
 <discourse/post/95@discoursehosted.martin-brennan.com>
 <12d1ec8f-859c-2339-2c7d-9cb3310756a2@discourse.org>
Subject: [The Email Threading Sandbox] [Royal Court] Threading topic 1 for
 2022-08-22

Could I send you a PM now to get you set up with an account on my testing site, and we can do some back and forth emailing/replying to see if this matches what you are expecting?

cameron-simpson · August 22, 2022, 9:41am

By Martin Brennan via Discourse Meta at 22Aug2022 05:36:

@cameron-simpson I think I’ve got this working as described, I have
mutt set up and things seem to be threading correctly (though I am not
sure why the subject line is omitted in the thread, and also not sure
how to set it up to see my own sent replies inline within the thread):

image1877×194 37.8 KB

Looks good to my eye.

The subject’s omitted on replies (unless it changes), which makes it
easier to see where the next thread begins. You can fold threads up if
you’d rather.

Seeing your own replies requires having a copy of the reply in that
folder. The $record setting controls that.

And here is how Thunderbird represents the same thread (actually I realised Thunderbird does not show my replies inline either):

image3149×234 104 KB

Also good to my eye.

Here is what it looks like in Gmail:

image3031×1695 337 KB

That’s … quite uncompact

Headers are below, the post IDs start at 91, so post 1 == post ID 91.
[…]

These headers all seem correct according to your descriptions of the
message relationships.

I notice that Discourse uses Reply-To with a distinctive id,
presumably to stitch together email replies based on the target email
address. Clearly this works. If Discourse derived that from the reply
In-Reply-To headers you could use a more stable address

Could I send you a PM now to get you set up with an account on my
testing site, and we can do some back and forth emailing/replying to
see if this matches what you are expecting?

Certainly!

Cheers,
Cameron Simpson cs@cskk.id.au

martin · August 22, 2022, 11:57pm

This is actually used to determine whether we are sending into a Category or a Topic, and is not really used as much as the Message-IDs etc. for In-Reply-To and References.

Thanks, I’ll send the PM and an email invite from my site.

martin · August 23, 2022, 4:44am

We’ve done a fair bit of back and forth now and it seems to be working as expected, here is an example of the thread in Thunderbird:

@cameron-simpson are you happy for me to go ahead with getting this into Discourse core now? Thanks again for doing some testing.

cameron-simpson · August 23, 2022, 5:47am

I think I want to review the References a little more closely - I thought I saw some oddities on a message, and i did a multireply by email and it didn’t seem to be recognised as such? I’ll try to have a look tonight.

martin · August 23, 2022, 5:57am

Ah yes I see that latest post you made now. What are you expecting within Discourse when you do a reply to 2 posts? I am not sure we support parsing out quotes and attributing them to multiple posts as replies from incoming emails. Thanks for taking a further look at References too. If you had your own Discourse instance and wanted to test this or were just curious of the logic the code is in this branch feature/the-phantom-email-thread, see FEATURE: Overhaul email threading by martin-brennan · Pull Request #17996 · discourse/discourse · GitHub . It still needs a little clean up too.

Edit: Found the reply issue, responding on the test forum.

cameron-simpson · August 23, 2022, 9:12am

By Martin Brennan via Discourse Meta at 23Aug2022 06:16:

Ah yes I see that latest post you made now. What are you expecting
within Discourse when you do a reply to 2 posts? I am not sure we
support parsing out quotes and attributing them to multiple posts as
replies from incoming emails.

I wasn’t expecting that. I was expecting Discourse to look at the
In-Reply-To message-ids, associate those with posts where they match,
and derive the “multiple reply” from that.

That said, I don’t even know how to do a multireply on the web (in email
it’s pretty easy, with mutt at least). Nor do I know how you represent
parent posts in your db. Surely you don’t parse the message text itself?

Thanks for taking a further look at References too. If you had your
own Discourse instance and wanted to test this or were just curious of
the logic the code is in this branch
feature/the-phantom-email-thread, see
FEATURE: Overhaul email threading by martin-brennan · Pull Request #17996 · discourse/discourse · GitHub . It still needs a
little clean up too.

Thanks, I’ll have a look. I need to sit and draw a picture of our test
discussion and check the various headers against it; I was too
distracted today.

Cheers,
Cameron Simpson cs@cskk.id.au

cameron-simpson · August 23, 2022, 11:16am

I stuck some comments in here, but my brain is shutting down. In particular, the comments on add_identification_field_headers are probably misguided - is this the fallback/original code for when this new experimental mode is not enabled? The comments on add_experimental_identification_field_headers are more salient.

Topic		Replies	Views
Discourse Emails not threaded properly in some Email clients Support	13	4926	June 16, 2022
Emails are not threaded in Outlook 2013 Bug	31	14427	January 9, 2015
Threading for email-only topics seems broken Support	7	1226	October 24, 2023
Email-in replies thread wrongly Bug	18	6476	June 23, 2017
Email threading broken Bug	8	758	July 29, 2022

Discourse email messages are incorrectly threaded

Related topics