Discourse email messages are incorrectly threaded

There are a few different ways we link to the parent post(s), and the association is stored via the PostReply table, with the reply_post_id representing the post that is making the reply and the post_id referencing the parent. Incoming emails use In-Reply-To to link these, and from the Discourse UI if you quote multiple posts then multiple PostReply records are made, and if you use the “Reply” button on a single post then it is used to create a PostReply record.

Yes sorry I should have noted that, add_identification_field_headers will no longer be used, it’s all in add_experimental_identification_field_headers the new code. Thanks for making comments on the code itself, I was not expecting that! Will work through them today.

@cameron-simpson my apologies, I think you may have reviewed an earlier commit in that PR. I’ve now rebased the code for it and pushed again to have a single commit with all the latest code that we have been testing on that test site. Hopefully the commit message there is helpful and logical.

For References, if the user is replying to a post then I use the full chain of Message-IDs from the OP down to the parent post in order. For example, if all these posts are a direct chain of replies:

  • Post 1 – discourse/post/500@test.site
  • Post 2 – discourse/post/501@test.site
  • Post 3 – discourse/post/502@test.site
  • Post 4 – discourse/post/503@test.site
  • Post 5 – discourse/post/504@test.site

Then the In-Reply-To header for the last post will be:

  • In-Reply-To: <discourse/post/503@test.site>

And References will be

  • References: <discourse/post/500@test.site> <discourse/post/501@test.site> <discourse/post/502@test.site> <discourse/post/503@test.site>

If a new post is created that does not reply directly to the post above, e.g. Post 6 – discourse/post/505@test.site, then its In-Reply-To Message-ID will be <discourse/post/500@test.site> (the OP), and References will be <discourse/post/500@test.site> which is the OP as well.

Please let me know if this is incorrect and I can revise.

3 Likes

By Martin Brennan via Discourse Meta at 23Aug2022 23:59:

@cameron-simpson my apologies, I think you may have reviewed an earlier
commit in that PR. I’ve now rebased the code for it and pushed again to
have a single commit with all the latest code that we have been testing
on that test site. Hopefully the commit message there is helpful and
logical.

I’ll have another look, thanks.

For References, if the user is replying to a post then I use the full chain of Message-IDs from the OP down to the parent post in order. For example, if all these posts are a direct chain of replies:

  • Post 1 – discourse/post/500@test.site
  • Post 2 – discourse/post/501@test.site
  • Post 3 – discourse/post/502@test.site
  • Post 4 – discourse/post/503@test.site
  • Post 5 – discourse/post/504@test.site

Sounds correct.

Then the In-Reply-To header for the last post will be:

  • In-Reply-To: <discourse/post/503@test.site>
    And References will be
  • References: <discourse/post/500@test.site> <discourse/post/501@test.site> <discourse/post/502@test.site> <discourse/post/503@test.site>

Also correct.

If a new post is created that does not reply directly to the post
above, e.g. Post 6 – discourse/post/505@test.site, then its
In-Reply-To Message-ID will be <discourse/post/500@test.site> (the
OP), and References will be <discourse/post/500@test.site> which is
the OP as well.

This is all correct.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like

Thank you. I also forgot to mention that we indeed do have to go to the database to recreate the References chain based on the PostReply records, you will see that in the latest commit.

@cameron-simpson I would like to get the internal review process going with this code next week (as well as just general polish of what I have done), since I am going on holiday on the 3rd. That way if the review is completed I can merge it and deploy it out once I get back. If you have any further feedback please let me know before then, otherwise I will assume all is good (which it did seem to be during our testing yesterday :slight_smile: ) .

1 Like

I think things are mostly good. My notes from hand sweeping the email thread at my end, which I’ve mostly hand swept for headers:

Topic for testing threading 2022-08-23

post   msg-id  detail
59/1   98 OP new topics for testing email threading
59/11 109 reply-to-topic in-reply-to 98 ref 98
59/2  100 reply-to-topic in-reply-to 98 ref 98
59/3  via-email uuid@discourse yes welcome in-reply-to 100 ref 98,100
        not noted as a reply in the web ui
???   me-via-email ...kr@cskk glad to be here in-reply-to uuid@discourse no refs
        not noted as a reply in the web ui
59/10 108 reply to earlier post in-reply-to ...kr@cskk ref 98,100,0aa@discourse,kr@cskk
59/5  103 thanks cameron in-reply-to kr@cskk refs 98,100,0aa@discourse,kr@cskk
???104   me-via-email ...zp@cskk in-reply-to 103 no refs
        not noted as a reply in the web ui
59/7 105 no problem in-reply-to zp@cskk refs 98,100,00a@discourse,kr@cskk,103,zp@cskk
        posted on web, reply to 104? aka zp@@cskk
        not noted as a reply in the web UI (so, new-top-topic?)
        quotes kr@cskk "glad to be here"
        NEEDS REVIEW
59/9 107 expected or a bug in-reply-to 106 ref 98,100,0aa@discourse,kr@cskk,103,zp@cskk,105,106
        quotes 59/8

Notes:
- email replies are not shown as replies in the web ui
- web multireplies only get one msg-id in the in-reply-to
- users do not get email copies of their own posts (email or web), would be nice to have an option in prefs for this
- web msg-ids seem to be forum post.id, would be nicer if topic.id/in-topic.id for easier tracing in headers

In summary: I did not find any incorrect headers, but note some shortcomings above.

I haven’t had a chance to review your updated code, but functionally things seem correct.

Thank you,
Cameron

1 Like

Thanks Cameron!

Can you elaborate on this a little? Do you mean you can’t see this?

Or that you cannot see this part? For the latter, we only show the Reply with the arrow if the post you are replying to is further up the topic, not just one previous:

Oh I did not think this is required, so if you reply to N posts via the web, all of those post’s Message-IDs should show in the In-Reply-To header, and then References follows the current logic of a thread from OP down to a single parent (in our case I choose the most recently created post as the single parent)?

Yes this is by design to not send you things that you have “seen” already, we could spin that out in a different TODO to see if there is a want for this from more users.

The problem with topic ID is that it is too brittle and not specific/unique enough, also it will seem a little confusing when a post gets moved between topics. Maybe we can include a custom header in the email, e.g. X-Discourse-Topic-ID or something (if that is allowed) to allow for easier visual tracking?

2 Likes

No, I see that icon.

Ah. Whereas I, as an email nonforum person expected that indicator on every reply because I’m not thinking about this as an instant messaging layout (maybe). So my expectations are at variance with what you’ve chosen to do.

It isn’t required. Think of it as “quality of service”. You explicitly go:

@message.header['In-Reply-To'] = referenced_post_message_ids[0] || topic_canonical_reference_id

and you could just drop the [0] there. Clients could then use just one message-id or do something very funky at their whim and it would all be valid.

“Should” is a strong word. You could include all the message-ids if they’re easily to hand. You’re not obliged, and the code is valid as is.

Aye. I know I like it myself so that I know my post made it to the list/forum - email being very queue based and some (cough, large Australian telco, cough) ISP mail handlers being very… unreliable, slow, etc etc. Occasionally I have seen other people want this (in lists, but that’s the mode we’re effectively talking about here). The default for such an option should probably be false.

As a nerd, I like being at least able to get an unfiltered feed so that I can make my own policy decisions.

Given that the Message-ID is effectively opaque/set-once I don’t view that as a problem unless there’s scope for reissuing the same message-id - if all your counters are strictly monotonic I would not expect that to happen. I just found it very tedious to match up post.id eg 98 to topic/post eg 59/1. It would have been handy to have something like category.id/topic.id/post-in-topic.id there instead of the 98.

That would also be sufficient. This is just convenience on the debugging headers side.

Cheers,
Cameron

4 Likes

That is still the old code you are looking at there, you only need to look at add_experimental_identification_field_headers . But your point still stands.

There is someone on our infrastructure team who would be agreeing firmly with this statement, they share a similar workflow and outlook to you :laughing:

That’s fair enough, perhaps I will re-introduce topic ID as well, since outbound_message_id is indeed set once and not changed (based on whether the post is generated in the Discourse web UI or via an incoming email).

Probably won’t need this now since I am adding the topic ID into the Message-ID once more.

Ah, should have seen this I guess:

 most_recent_post = referenced_posts.first
      most_recent_post_message_id = Email::MessageIdService.generate_or_use_existing(most_recent_post)
      @message.header["In-Reply-To"] = most_recent_post_message_id

Anyway, the existing stuff is already valid. Entirely your call.

2 Likes

I’m actually not sure about this, I’ve got a gut feeling we don’t really want the topic ID in those Message-IDs, just having the post ID makes it super simple. Will try this instead:

Actually we have them already and remove them here:

We can just keep them in, doesn’t hurt and helps with visual debugging!

That’s fine by me. I just found it particularly hard to tie email messages headers to posts because the post.id isn’t evident in the UI or the URL for the particular post on the web. Having that present in an additional context header would be just as good.

Thanks,
Cameron

1 Like

Just doing a quick ping - the PR seems to have been quiet for quite a while. - Cameron

Hi Cameron, I’ve been at our company meetup then on vacation for the last 2 weeks, just getting back into the swing of things today. If it’s not merged this week then it will definitely be merged next week. Apologies for the delays!

By Martin Brennan via Discourse Meta at 20Sep2022 00:17:

Hi Cameron, I’ve been at our company meetup then on vacation for the
last 2 weeks, just getting back into the swing of things today. If it’s
not merged this week then it will definitely be merged next week.
Apologies for the delays!

No need for apologies. I knew you’d been away, wasn’t sure about your
timing or return. And I guess it might still be Monday where you are :frowning:

Thank you,
Cameron Simpson cs@cskk.id.au

2 Likes

I just merged the PR, I will try to get the Python forum deployed later today so you can start using it in earnest.

1 Like

By Martin Brennan via Discourse Meta at 25Sep2022 23:29:

I just merged the PR, I will try to get the Python forum deployed later
today so you can start using it in earnest.

That would be amazing. Thanks, Cameron

2 Likes

That’s been done now, please let me know here if you experience any issues :+1:

2 Likes

Thank you. I’ll let you know how it looks. Cameron