Discourse 邮件消息的线程不正确

在 discuss python org 上,我们正在讨论 Discourse 的电子邮件方面。最大的抱怨是缺乏线程。我对邮件头进行了一些研究,发现:

  • Message-ID 邮件头至少是唯一的
  • Reply-ToReferences 邮件头 指向其他邮件的 Message-ID,更不用说它们所回复邮件的邮件 ID 了
  • 它们反而指向一个基于主题编号的虚构邮件 ID

这意味着使用电子邮件的人会看到(a)完全扁平的非线程化讨论,以及(b)根消息似乎丢失了,因为 In-Reply-ToReferences 邮件头指向一个实际上从未出现在任何消息中的邮件 ID。

这是糟糕的,并且违反了 RFC 5322。这使得电子邮件体验远不如它本应有的那样好。

例如,那里有一个主题,其第一条消息具有以下邮件头:

Message-ID: <topic/17208.dc83577b18fc3ecc438ed42a@discuss.python.org>
References: <topic/17208@discuss.python.org>

这是第一条消息。它不应该有 References 邮件头,因为没有任何消息具有该 ID。

第二条消息具有以下邮件头:

Message-ID: <topic/17208/60568.898edf234f56cf6f3a661c1a@discuss.python.org>
In-Reply-To: <topic/17208@discuss.python.org>
References: <topic/17208@discuss.python.org>

同样,Message-ID 是可以的,但 In-Reply-ToReferences 完全没有意义。

应该 很容易修复。第一条消息不应同时具有 In-Reply-ToReferences 邮件头。第二条消息应在 In-Reply-ToReferences 邮件头中包含第一条消息的 Message-ID

有关详细信息,请参阅 RFC5322 第 3.6.4 节:

目前,电子邮件用户看到的是扁平的、无结构的讨论。通过这些修复,他们可以获得易于理解的、有意义的线程显示。

9 个赞

如果有人感兴趣,卡梅隆提到的讨论存档可以在 https://mail.python.org/archives/list/python-dev@python.org/message/VHFLDK43DSSLHACT67X4QA3UZU73WYYJ/ 找到。

2 个赞

这似乎是一个回归,请参阅旧主题和修复程序

1 个赞

我只是在查看 HEAD 和那个修复之间的差异。

在我看来,即使没有先行者,当前版本也总是设置 Referencestopic_canonical_reference_id 用作后备。我仍然认为这是错误的,因为没有 ID 为该 ID 的电子邮件消息。

In-Reply-To 稍微正确一些,因为它仅在 post.post_number!=1 时设置,但它仍然回退到 topic_canonical_reference_id

@message.header['In-Reply-To'] = referenced_post_message_ids[0] || topic_canonical_reference_id

在我看来,这似乎有两个问题:

  • 如果没有 referenced_post_message_ids,后备应该是 #1 帖子的 Message-ID,而不是 topic_canonical_reference_id
  • receipt-of-reply-emails 代码中的某些内容一定是在删除回复邮件的 In-Reply-To 标头,因为它们应该正确地填充了 referenced_post_message_ids 数组(列表?我对 Ruby 是新手)
3 个赞

Cameron,感谢您就此话题展开讨论并在帖子中提供了大量详细信息。我对此“麻烦事”负有责任,源于这两次提交:

我们已经注意到像 Thunderbird 这样的邮件客户端在邮件主题方面存在一些问题一段时间了,但由于使用 Discourse 进行邮件主题的用户数量不多,所以一直被搁置。但现在这个问题浮出水面,我们需要花些时间重新审视这个问题并着手解决。

有趣的是,我们当时为发送的第一封邮件以及之后的所有邮件都添加了此 References 标头,因为它能使 Gmail 中的邮件主题正常工作。但我同意这并非理想状态,并且可能与后续邮件的 In-Reply-ToReferences 标头未使用原始 Message-ID 一起导致了邮件主题问题。

请耐心等待,我将查看旧的讨论和代码并着手解决。在此期间,您是否知道其他正在使用并且遇到问题的邮件客户端?例如,我知道这是 Thunderbird 中的一个问题,但其他客户端呢?谢谢。

7 个赞

写了一封很长的回复,但收到了以下信息:

抱歉,您发送给
["incoming+8349bd9eb1f2b582df4f32dbe85c3363@meta.discoursemail.com"]
(标题为 Re: [Discourse Meta] [bug] Discourse email messages are
incorrectly threaded)的电子邮件未能成功发送。

原因:
抱歉,新用户只能在帖子中添加 2 个链接。
如果可以修正问题,请重试。

我将把它发到论坛上,这样我就可以捕捉和修改了……

1 个赞

Cameron,感谢您开启这个话题并提供
您帖子中的大量细节。我对此“一团糟”负有责任,
源于这两个提交:

3b13f1146b2a406238c50d6b45bc9aa721094f46

这看起来没问题。它是否将此 ID 与数据库记录一起保存,以便入站回复可以与前一个论坛消息关联?

另外,您希望我验证后缀在语法上是否符合 RFC5322,就允许的字符而言?

82cb67e67b83c444f068fd6b3006d8396803454f

第二个提交似乎解决了我们看到的另一个问题:如果一篇文章来自电子邮件,发送给电子邮件用户的出站消息 ID 不是作者源消息的消息 ID。这导致邮件客户端认为有两个不同的消息,并且可能会破坏对原始消息的回复,而不是对论坛发送的副本的回复。例如:

收件人:论坛
抄送:其中一位参与者

该参与者将(嗯,可能会)收到论坛的一份副本和作者的一份直接副本,并且由于它们具有不同的消息 ID,因此在他们的终端上,这些将是不同的消息。

我本来打算在解决 in-reply-to 和 references 标头问题后,再就此问题提交第二个 bug 报告,因为那个问题更为重要。

我们已经注意到电子邮件客户端(如 Thunderbird)中的一些关于线程的问题有一段时间了,但它并没有代表 Discourse 的大量电子邮件线程消费者,所以我们一直将其推迟,但现在这个问题浮出水面,我们需要花一些时间重新审视这个问题并着手解决。

我和几个人使用 mutt。我很乐意尽我所能协助调试和审查代码。我以前也曾担任过多年的邮件系统管理员。

[quote=“Cameron Simpson, post:1, topic:233499,
username:cameron-simpson”]
这是第一条消息。它不应该有 References 标头,因为没有任何消息具有该 ID。
[/quote]

有趣的是,我们当时向第一封发送的电子邮件以及之后的所有后续电子邮件添加了这个 References 标头,因为它使 Gmail 中的线程能够正常工作,

我认为正确的 References 标头(在第一条帖子中缺失,就像回复中的 in-reply-to 一样)也应该有效。但 Gmail 有时与邮件标准的关系相当松散。我有一个 Gmail 账户;我也可以在那里进行一些调试。原则上,我们可以使用这次讨论本身作为测试平台,也许吧。

但我同意这并非理想,并且很可能导致线程问题
以及在后续电子邮件
In-Reply-ToReferences 标头中未使用原始 Message-ID

请耐心等待,我将查看旧的讨论和代码并解决这个问题。

没关系。

同时,您是否知道其他正在使用并且遇到问题的电子邮件客户端?例如,我知道这是 Thunderbird 中的一个问题,但其他客户端呢?谢谢。

绝对是 mutt。至少使用 mutt 可以很容易地查看标头,并且还可以查看回复树链,这在其他客户端中经常被隐藏。

邮件线程完全由 Message-IDIn-Reply-To 标头定义。References 标头起源于 USENET 的后续帖子,并且(在那里)支持多个消息 ID;In-Reply-To 只支持一个。看起来 References 现在也存在于 RFC5322 中,我将检查其语义。

5 个赞

我只是在整理我的想法,稍后会发一篇关于此事的长文,感谢您到目前为止提供的额外信息!

1 个赞

好的,这有点大,请耐心等待。首先,感谢您又一次详细的回复和调试/审查的提议,这真的很有帮助 :+1: 我今天早上一直在研究这个问题,令人惊讶的是,在 Thunderbird 中,线程在大多数情况下都能正常工作,我认为 References 标头始终指向 OP 有所帮助(例如,此链中的主题“Reference”始终存在,为 \u003ctopic/53@discoursehosted.martin-brennan.com\u003e)。

线程未按预期工作的场景是:

  1. 在 Discourse 中创建一个帖子,然后向关注该主题的人发送电子邮件然后
  2. 另一个人回复该帖子,然后向关注该主题的人发送电子邮件

对于第二封电子邮件,它会收到不正确的 In-Reply-ToReferences 标头,因为它在此处生成了一个标头 discourse/lib/email/sender.rb at 98bacbd2c6b9fe57167cd32af5eb4839b4a5d1f6 · discourse/discourse · GitHub Message-ID。在截图中,以下模式的消息应该放置在此处:

image

答案是——这取决于。如果帖子是从传入电子邮件在 Discourse 中创建的,例如您的这封,当有人回复它时,我们会使用该帖子原始传入的 Message-ID 作为 In-Reply-ToReferences 标头,如下所示:

否则,我们只是使用主题 OP 的引用,并生成一个新的引用,这显然是导致所有问题的原因。在所有情况下,我们每次发送出站电子邮件时都会生成一个新的 Message-ID,这似乎是正确的,并且与其他邮件客户端一致。

我想我明白了你的意思,它是这样进行的吗?

  1. cameron 从 mutt 发送电子邮件到 Discourse,该邮件获得 Message-ID: 74398756983476983@mail.com
  2. Discourse 创建一个帖子,并将 Message-IDIncomingEmail 记录一起存储在帖子中
  3. johndoe 正在关注该主题,因此 Discourse 向他们发送一封电子邮件,其中包含 Message-ID: topic/222/44@discourse.com,并且没有引用原始的 Message-ID: 74398756983476983@mail.com

听起来对吗?我们应该只是将该 Message-ID “传递”给关注该主题的人,而不是生成我们自己的,因为它已经唯一了?那么,如果 cameron 也将他抄送给了原始出站消息呢?这听起来确实是一个单独的问题,因此最好为此打开另一个 bug 主题。

我将在本地设置一个 mutt 客户端,看看你们也看到了什么,我从未在基于文本的客户端(仅 Gmail 和 Thunderbird)中测试过此功能,所以我很想看看它是什么样的。


我今天早上解决这些问题的思路是放弃我们在发送电子邮件 Message-ID 标头时生成的随机后缀,而是改为使用发送者和接收者用户的 user_id 的方案。这样做的好处是无需将 Message-ID 存储在任何地方(除非传入电子邮件创建帖子),因此 ReferencesIn-Reply-To 标头将始终保持一致。让我举个例子。假设我们有这些用户:

  • martin - user_id 25
  • cameron - user_id 44
  • sam - user_id 78
  • bob - user_id 999

然后我们有一个主题,topic_id 233499,帖子从 post_id 100 开始作为 OP。格式将变为 topic/#{topic_id}/#{post_id}.s#{sender_user_id}r#{receiver_user_id}。操作顺序如下:

  1. martin 创建 OP
  • cameron 收到一封包含以下标头的电子邮件:
    • Message-ID: topic/233499.s25r44@meta.discourse.org
    • References: topic/233499@meta.discourse.org
  • sam 收到一封包含以下标头的电子邮件:
    • Message-ID: topic/233499.s25r78@meta.discourse.org
    • References: topic/233499@meta.discourse.org
  1. cameron 通过电子邮件回复
  • discourse 从 mutt 收到一封包含以下标头的电子邮件:
    • Message-ID: 43585349859734@test.com
    • References: topic/233499@meta.discourse.org topic/233499.s25r44@meta.discourse.org
    • In-Reply-To: topic/233499.s25r44@meta.discourse.org
  1. discourse(作为 cameron,来自上面的电子邮件)创建帖子 101
  • sam 从 discourse 收到一封包含以下标头的电子邮件:
    • Message-ID: topic/233499/101.s44r78@meta.discourse.org
    • References: 43585349859734@test.com topic/233499@meta.discourse.org
    • In-Reply-To: 43585349859734@test.com
  1. sam 通过电子邮件回复 cameron
  • discourse 从 gmail 收到一封包含以下标头的电子邮件:
    • Message-ID: 5346564746574@gmail.com
    • References: topic/233499/101.s44r78@meta.discourse.org topic/233499@meta.discourse.org
    • In-Reply-To: topic/233499/101.s44r78@meta.discourse.org
  1. discourse(作为 sam,来自上面的电子邮件)创建帖子 102
  • cameron 从 discourse 收到一封包含以下标头的电子邮件:
    • Message-ID: topic/233499/102.s78r44@meta.discourse.org
    • References: 5346564746574@gmail.com topic/233499@meta.discourse.org
    • In-Reply-To: 5346564746574@gmail.com
  1. bob 在主题中创建帖子 103,不是回复任何人(请注意,这里的 References 包括发送给两个用户的 OP 电子邮件的 Message-ID
  • cameron 收到一封包含以下标头的电子邮件:
    • Message-ID: topic/233499/103.s999r44@meta.discourse.org
    • References: topic/233500@meta.discourse.org topic/23499.s25r44@meta.discourse.org
  • sam 收到一封包含以下标头的电子邮件:
    • Message-ID: topic/233499/103.s999r78@meta.discourse.org
    • References: topic/233499@meta.discourse.org topic/23499.s25r78@meta.discourse.org
  1. cameron 通过电子邮件回复
  • discourse 从 mutt 收到一封包含以下标头的电子邮件:
    • Message-ID: 6759850728742572@test.com
    • References: topic/233499@meta.discourse.org topic/233499/103.s999r44@meta.discourse.org
    • In-Reply-To: topic/233499/103.s999r44@meta.discourse.org

cameron 的收件箱

  • martin - 主题 OP
    • 发送 - > 收件人:discourse,回复:主题 OP
      • sam - 回复第二个帖子
    • bob - 主题中的回复,不是针对任何特定帖子
      • 发送 - > 收件人:discourse,回复:bob 的帖子

sam 的收件箱

*martin - 主题 OP

  • cameron - 第二个帖子
    • 发送 - > 收件人:discourse,回复:第二个帖子
  • bob - 主题中的回复,不是针对任何特定帖子

我认为这是正确的,你能帮我检查一下我写在这些标头中的内容,并确认这是否是您在此场景中所期望的吗?我唯一有点不确定的地方是我是否涵盖了所有的 References,当然,在推出之前,我会在开发分支上对实际电子邮件进行测试。我也还没有在 mutt 中测试过任何东西。


顺便说一句,我还查看了 GitHub 如何处理他们的通知电子邮件,并注意到他们做了类似的事情,他们有一个始终存在的 Referencediscourse/discourse/pull/252@github.com),该引用用于与该“主题”相关的所有电子邮件,在本例中是 GitHub 拉取请求:

References: <discourse/discourse/pull/252@github.com> <discourse/discourse/pull/252/issue_event/7042100517@github.com>
In-Reply-To: <discourse/discourse/pull/252/issue_event/7042100517@github.com>
6 个赞

By Martin Brennan via Discourse Meta at 22Jul2022 06:34:

Okay this is kind of huge, please bear with me. First, thanks for
another detailed reply and the offer of debugging / review, it is
really helpful :+1: I’ve actually been looking into this this morning
and, surprisingly, the threading in a unified view works in Thunderbird
for most cases, and I think the References header consistently
pointing to the OP helps with that (for example the topic Reference
in this chain which is always present is
<topic/53@discoursehosted.martin-brennan.com>.

I’ve just reread RFC5322 section 3.6.4 closely. It has moved on from
earlier versions (822 and 2822), and has merged the email In-Reply-To
headers, USENET References headers and modern
reply-citing-more-that-one previous messages.

The short summary:

  • The Message-ID is a single persisent identifier for a message
  • The In-Reply-To contains all the message-ids of which this message
    is a direct reply, so if I reply to a pair of messages it will have
    those 2 message-ids
  • The References is a reply chain of antecedant message-ids from the
    OP to the preceeding message. So indeed it should always start with
    the OP message-id.

So for a discussions like this, pretending that labels are message-ids:

OP
  -> reply1
    -> reply2 ---+
  -> reply3      |
    -> reply4    |
      -> reply5 <+

The reply5 would have:

  • message-id=reply5
  • in-reply-to=“reply2 reply4”
  • references=“OP reply3 reply4”

It is also leagel to include “reply1 reply2” in the references (the
other chain to reply5) but the RFC explicitly recommends against that
becaause some clients expect the references to be a single linear chain
of replies, not some flattened digraph.

So my recommendation for constructing the references is to use the
references of the “primary” antecedant message with the primary
antecedant message’s message-id appended. That way you always get a
linear chain in the correct order.

Interestingly there seems to be some threading there.

But notice: the top post has a little “is a reply” arrow. Even though it
is post 1. I expect that is because of the “topic” references entry,
which make TB think there was a earlier message (which of course there
was not).

In mutt-land we see almost no threading at all:

23Jul2022 06:24 Olha via Discus - ┌>[Py] [Users] I need an advise  discuss-users 5.7K
22Jul2022 17:12 Paul Jurczak vi - ├>[Py] [Users] I need an advise  discuss-users 5.5K
22Jul2022 13:21 Rob via Discuss - ├>[Py] [Users] I need an advise  discuss-users 6.8K
22Jul2022 12:53 vasi-h via Disc - ├>[Py] [Users] I need an advise  discuss-users 5.5K
22Jul2022 11:38 Cameron Simpson - ├>[Py] [Users] I need an advise  discuss-users  14K
22Jul2022 10:27 Rob via Discuss - ├>[Py] [Users] I need an advise  discuss-users 6.6K
22Jul2022 06:14 vasi-h via Disc r ┴>[Py] [Users] I need an advise  discuss-users 6.5K

which is because every message’s In-Reply-To points directly at the
fictitious “topic” message-id. Mutt probably ignores the References
because it is a mail reader, and References originates in USENET news.
Maybe Thunderbird is using the references or augumenting the in-reply-to
with references information.

You only need to consult one of In=-Reply-To or References to do
threading; the former comes from email and the latter from USENET.
You’re supporting both (which is great!) so we need to make them
consistent.

(Aside: there’s also discussion about USENET mirroring, because several
python people consume the lists via a USENET interface. Again, a
separate topic.)

[…]

[quote=“Cameron Simpson, post:8, topic:233499,
username:cameron-simpson”]
This looks fine. Does it save this id with the db record so that inbound
replies can be tied to the antecedant forum message?
[/quote]

The answer is – it depends. If a post is created in Discourse from an inbound email, such as this one of yours, we use that post’s original inbound Message-ID when someone replies to it for the In-Reply-To and References headers as per:

discourse/lib/email/sender.rb at 98bacbd2c6b9fe57167cd32af5eb4839b4a5d1f6 · discourse/discourse · GitHub

Otherwise we are just using the topic OP reference and just generating a new reference, which obviously is what is causing all the issues. In all cases we generate a new Message-ID every time an outbound email is sent, which seems correct and on par with other mail clients.

Alas, not quite. If you’re the origin of the message (i.e. authored in
Discourse), generating the message-id is fine. If there’s no message-id
(illegal) generating one is standard practice (usually by MTAs). But if
you’re passing a message on (authored in email), the existing message-id
should be preserved.

To my mind you need to be doing 3 things:

  1. having a stable message-id and not replacing the message-id from an
    inbound message
  2. generating correct In-Reply-To, which is easily computed from the
    immediate antecedant message(s) i.e. antecedant(s)-Message-ID
  3. generating correct References, which is easily computed as
    antecedant-References + antecedant-Message-ID

For point 1, looking at the code you cite, you probably want the email
message id to be (Pythonish syntax, sorry):

def message_id(post):
    return post.incoming_email.message_id or discourse_message_id(post)

i.e. to be the post’s email message-id if it originated from email,
otherwise the Discourse message-id using something like the algorithm
you outline later in this message: anything (a) stable and (b)
syntacticly valid.

Then computing the In-Reply-To and References fields is simple
mechanical stuff as in points 2 and 3.

I think I see what you mean, does it go like this:

  1. cameron sends email to Discourse from mutt which gets Message-ID: 74398756983476983@mail.com
  2. Discourse creates a post and stores the Message-ID with against the post with an IncomingEmail record

Correct.

  1. johndoe is watching the topic, so they get sent an email from Discourse with a Message-ID: topic/222/44@discourse.com and no reference to the original Message-ID: 74398756983476983@mail.com

No. You really want to pass through IncomingEmail.message_id as the
Message-ID in the email to johndoe. It’s the same message.

Does that sound correct, that we should just “pass on” that Message-ID to those watching the topic instead of generating our own since it’s already unique? What then happens in johndoe’s mail client if
cameron also CC’d him on that original outbound message? This does sound like a separate issue so it would be good to open another bug topic for it.

By passing it on, the original message (cameron->cc:johndoe) and the
Discourse forwarded message (cameron->Discourse->johndoe) have the same
message-id and the same message contents. The receiving mail system
stores both. The mail reader sees both, and either presents both or
keeps just one (this is a policy decision of the mail reader - keeping
just one is common). Because they’re the same message, in general it
does not matter which is kept.

If we ignored discourse and considered a message which was
a copy of the message via the list and also via direct email. They’re
the same message, with the same message-id.

I will set up a mutt client locally to see what you are also seeing, I have never tested this functionality in a text-based client (only Gmail and Thunderbird) so I am keen to see how it looks anyway.

Happy to help with settings. For threaded view you need to set the
sorting to threadeed. Mutt is very configurable.

My line of thinking to address these issues this morning was to dispose
with the randomly generated suffixes generated when we send
Message-ID headers in emails and instead change to a scheme where we
use the user_id of both the sending and receiving user. The benefit
of this is that there is no need to store the Message-ID anywhere
(apart from when an inbound email creates a post) and so References
and In-Reply-To headers will always be consistent.

Yes, that is much better. Noting that the inbound email message-id
should override the Discourse derived message-id for the outbound email.

(Most mail systems use random strings because there’s no surrounding
context such as the discourse topic message structure - messages are
considered alone; but the only real requirement is persistent
uniqueness.)

Let me give an example. Say we have these users:

  • martin - user_id 25
  • cameron - user_id 44
  • sam - user_id 78
  • bob - user_id 999

And then we have this topic, topic_id 233499, with posts starting from post_id 100 as the OP. The format would become topic/#{topic_id}/#{post_id}.s#{sender_user_id}r#{receiver_user_id}.

The order of operations would look like this:

  1. martin creates the OP
  • cameron is sent an email with these headers:
    • Message-ID: topic/233499.s25r44@meta.discourse.org
    • References: topic/233499@meta.discourse.org
  • sam is sent an email with these headers:
    • Message-ID: topic/233499.s25r78@meta.discourse.org
    • References: topic/233499@meta.discourse.org
  1. There should not be a References header in the OP. It isn’t
    needed for threading and effectively pretends there’s some “post 0”
    which doesn’t exist. It meeans every OP (a) looks like a reply, which it
    is not and (b) looks like the thing to which it is a reply is missing
    from the reader’s mailbox.

  2. This makes different message-ids for each outbound copy of the OP.
    That’s bad. They need to be the same. Supposing sam CCs cameron
    directly in a reply. The In-Reply-To will cite a mesage-id cameron
    has never received.

You can just drop the sender_user_id and receiver_user_id from the
message-id field and get a single unique id which every receiver sees.

The uniqueness constraint is the post itself, not the individual
email-level “message” object.

Re the References, the OP should not have one. TB and everything else
will be fine. If they’re threading using References instead of
In-Reply-To, the References in the reply messages are enough.

Here’s the start of a mailing list discussion thread in Mutt:

16Jul2022 01:09 Rob Boehne      - │├>[Python-Dev] Re: [SPAM] Re: Swit python-dev 9.2K
16Jul2022 01:33 Peter Wang      - │├>                                 python-dev 3.0K
16Jul2022 00:24 Skip Montanaro  - ├>[Python-Dev] Re: Switching to Dis python-dev 4.2K
16Jul2022 04:49 Erlend Egeberg  - ├>[Python-Dev] Re: Switching to Dis python-dev  10K
16Jul2022 04:20 Mariatta        - ├>[Python-Dev] Re: Switching to Dis python-dev  10K
15Jul2022 21:18 Petr Viktorin   - [Python-Dev] Switching to Discourse python-dev 4.2K

Ignore that I sort my email newest-on-top. See that there’s no arrow on
the initial post (at the bottom). That messgae has no References and
no In-Reply-To. All the others have In-Reply-To (and possibly
References, but this is an email mailing list so not necessarily; as I
mentioned before they’re complimentary.)

If I repeat my Discourse example from earlier:

23Jul2022 06:24 Olha via Discus - ┌>[Py] [Users] I need an advise  discuss-users 5.7K
22Jul2022 17:12 Paul Jurczak vi - ├>[Py] [Users] I need an advise  discuss-users 5.5K
22Jul2022 13:21 Rob via Discuss - ├>[Py] [Users] I need an advise  discuss-users 6.8K
22Jul2022 12:53 vasi-h via Disc - ├>[Py] [Users] I need an advise  discuss-users 5.5K
22Jul2022 11:38 Cameron Simpson - ├>[Py] [Users] I need an advise  discuss-users  14K
22Jul2022 10:27 Rob via Discuss - ├>[Py] [Users] I need an advise  discuss-users 6.6K
22Jul2022 06:14 vasi-h via Disc r ┴>[Py] [Users] I need an advise  discuss-users 6.5K

See they all have a leading arrow? That is because the mail client
believes they are all replies to a common (and missing) root message,
which is because of the “topic” message-id in the References header.
Whereas post 1 is actually the bottom message displayed above.

Summary:

  • your plan is good, provided you drop the sender and receiver from the
    message-id - they’re unnecessary and in fact the receiver will cause
    trouble (the sender is just redundant).
  • drop the “topic” pseudo-message-id from the References - it misleads
    email clients (including TB, even if it isn’t visually evident)
  1. cameron replies via email
  • discourse is sent an email with these headers from mutt:
    • Message-ID: 43585349859734@test.com
    • References: topic/233499@meta.discourse.org topic/233499.s25r44@meta.discourse.org
    • In-Reply-To: topic/233499.s25r44@meta.discourse.org

Yes, again with the caveat that there should not be a “topic” reference.
As expected, there is a reference to the OP message-id. Though it should
be the same message-id that sam sees for the OP.

  1. discourse (as cameron, from the above email) creates post 101
  • sam is sent an email from discourse with these headers:
    • Message-ID: topic/233499/101.s44r78@meta.discourse.org
    • References: 43585349859734@test.com topic/233499@meta.discourse.org
    • In-Reply-To: 43585349859734@test.com

And here it goes wrong. The Message-ID should be
43585349859734@test.com from the .incoming_post.message_id field.
(Well, in my mind this is post.message_id(), which returns
post.incoming_post.message_id for an email generated post and your
Discourse generated one otherwise).

Consider: I compose and send my reply with message-id
43585349859734@test.com. For continuity reasons, I keep a copy of that
in my local folder, where it shows as a reply to the OP. Ideally
Discourse also sends me a copy of my own post (this is a policy setting
on many mailing lists), so I get Discourse’s version also. That should
have the same message-id, because it is the same message, just via a
different route.

Discourse’s message is not “in reply to” my message. It is my
message, just forwarded.

This effect cascades through your following examples. The actual process
should be simpler than you’ve made it.

Think of it this way. If I reply to a post from email, it effectively is
like me emailing sam (and the others) via Discourse. Discourse
forwards my message to the email-receiving subscribers, and
“incidentally” keeps a copy on the forum :slight_smile:

As a side note, I also looked into what GitHub do with their
notification emails, and noticed they do a similar thing where they
have an ever-present Reference
(discourse/discourse/pull/252@github.com) that is used in all the
emails related to that “topic” which in this case is a GitHub pull
request:

References: <discourse/discourse/pull/252@github.com> <discourse/discourse/pull/252/issue_event/7042100517@github.com>
In-Reply-To: <discourse/discourse/pull/252/issue_event/7042100517@github.com>

Hoo, github. What a disaster their issue emails are :slight_smile:

However, in their scenario, the PR is the OP. So a reference directly
to the pull is sane. You could use the “topic” message-id for post 1,
provided you didn’t also use the “topic/1” id as well. But there seems
little point - it is extra effort to special case post 1 - I’d just use
“topic/1” myself.

To add some complication. As I understand it, an admin can move a post
or topic. Doesn’t that break the “generate the message-id” scheme,
particularly if they move just a post? I’m somewhat of the opinion that
every post should have a _message_id field, filled in from the
incoming message (from email) or generated (posting via Discourse). Then
it is persistent and stable and robust against any shuffling of posts or
changes of algorithm.

Finally, there’s a small security consideration: you should ignore the
inbound email message-id (and potentially bounce the message) if it
claims the message-id of an existing post. Since as an author, I can put
anything I like in that header :slight_smile: I’d go with just dropping the
message-id - accept the post, but don’t let it lie about being some
other post - give your copy the Discourse-generated id and then proceed
as normal.

7 个赞

非常感谢您再次提供如此深入的回复。我可能需要一些时间来消化这些信息并将其转化为可行的项目,所以请耐心等待(此外,我目前还有一些高优先级的内部项目正在进行中)。我认为有了这些信息,我们将能够使我们的线程系统更加健壮并符合规范。在深入研究您的帖子时,我可能会有更多问题,谢谢 Cameron。

2 个赞

通过 Discourse Meta 的 Martin Brennan 于 2022 年 7 月 25 日 00:28 发布:

再次感谢您如此深入的回复。我可能需要一些时间来处理并将其转化为可行的项目,所以请耐心等待(此外,我目前还在处理一些其他高优先级的内部项目)。我认为有了这些信息,我们将能够使我们的主题系统更加健壮并符合规范。在仔细阅读您的帖子时,我可能会有更多问题,谢谢 Cameron。

好的。祝好,Cameron Simpson

1 个赞

顺便说一句,我注意到您这篇后续帖子有以下标题:

Message-ID: <topic/233499/1137586.d14eea2849d76c355ec214fb@meta.discourse.org>
In-Reply-To: <YttEVzlTh/ymDSPT@cskk.homeip.net>
References: <topic/233499@meta.discourse.org>
      <YttEVzlTh/ymDSPT@cskk.homeip.net>

也就是说,它保留了我原始邮件的消息 ID。所以 In-Reply-To 是正确的,而 References 至少包含了我的邮件消息 ID。

这并不是我们在 discuss.python.org 上观察到的情况。

祝好,
Cameron Simpson

1 个赞

啊,这是一个有趣的观察,我没注意到那个小箭头。

这也非常有趣。我相信(在不检查源代码的情况下)Thunderbird 确实是这样做的,Gmail 的用户界面也很可能如此,因为它做了同样的事情。

我们似乎确实在这样做,但我想不是一致的?基本上,我们需要确保:

  • TODO #1 - 如果帖子有相关的 IncomingEmail 记录,我们在发送电子邮件时始终使用该 Message-ID。
  • TODO #2 - 在发送与主题 OP 相关的电子邮件时,请勿使用 References @cameron-simpson 有一个问题——如果 OP 是通过入站电子邮件创建的,我们是否会在 References 中使用该 Message-ID 作为 OP,还是仍然排除它?

这很有趣,我以为每封电子邮件的收件人都必须有一个唯一的 Message-ID?事实上,我认为这就是我们选择为每个收件人的 Message-ID 添加唯一性的原因,以避免垃圾邮件行为,回顾我们内部的主题。也许 @supermathie,他是我们基础设施团队的一员,并且在今年早些时候进行了大量的电子邮件测试,他也可以对此发表意见?

你的意思是,帖子才是决定所有收件人单个 Message-ID 的事物。那么我们是否为每个生成电子邮件的帖子生成一个?然后我们也可以将 IncomingEmail.message_id 移到这里。初步来看,我们需要进行的更改是:

  • TODO #3 - 向 Post 表添加一个 outbound_message_id。在与帖子相关的电子邮件首次发送时生成一次。将其用于后续的 ReferencesIn-Reply-To 标头。当帖子由 IncomingEmail 创建时设置其值。格式应为 topic/:topic_id/:post_id/:random_alphanumeric_string@host,例如 topic/233499/33545/gvy8475y7c45y87554c@meta.discourse.org

在此更改后,我的第一个示例将变成这样:

  1. martin 创建 OP
  • cameron 收到一封带有以下标头的电子邮件:
    • Message-ID: topic/233499/33545/gvy8475y7c45y87554c@meta.discourse.org
  • sam 收到一封带有以下标头的电子邮件:
    • Message-ID: topic/233499/33545/gvy8475y7c45y87554c@meta.discourse.org

同时考虑到 OP 没有特殊处理,它将不再是 topic/:topic_id@hostname 的格式。

  • TODO #4 - 确保根据 PostReply 记录和 Post 表上新的 outbound_message_id 列生成正确的 In-Reply-To 和 References 标头

我认为我们已经考虑了这一点,我会再次检查。

看起来确实是这样 :sweat_smile:


Cameron,你能确认这里的 TODO 听起来合理吗?现在看来确实不多了。我也想知道,当我开始这项工作时,你是否愿意加入一个测试 Discourse 实例,其中部署了 WIP 更改,以便我们可以来回发送电子邮件并测试事情是否正常工作?当然,在让你参与之前,我会自己进行测试。

如果不方便,那也没关系——我有 Thunderbird,并且会设置 mutt,我可以在那里测试所有内容 :slight_smile:

1 个赞

@cameron-simpson 我想在这里澄清的一件事是“message_id”的作用域。
引起这场争论的起因是 @supermathie 强烈怀疑我们不唯一的 message_id 导致了问题。
Discourse 会为它发送的每封电子邮件生成唯一的用户邮件。所以,举个例子,假设有 2 个用户正在关注这个话题:

  • 用户 1 收到载荷 1,其中包含一个针对用户 1 的独特退订链接
  • 用户 2 收到载荷 2,其中包含一个针对用户 2 的独特退订链接
    如果在这两种情况下,我们的 message id 都是 discourse_topic_100/23(即 topic_id/post_number),那么我们将告诉 MTA(邮件传输代理)discourse_topic_100/23 可以是 2 个不同的载荷,假设它们将此视为垃圾邮件信号。

嘿 Discourse… 你刚发送了两封名为 discourse_topic_100/23 的电子邮件,是怎么回事?

由于 Discourse 控制所有电子邮件传输,并且电子邮件不会像传统邮件列表那样添加到密送或抄送列表中,因此我们可以拥有干净的、针对每个用户的退订链接。
你对此有什么看法?使用 discourse_topic_100/23/7333(例如,topic_id、post_number、user_id)作为邮件的唯一标识符,这是一个简单的更改,它肯定是一个唯一的载荷,并且在为用户生成邮件时我们可以轻松地引用它。

1 个赞

By Martin Brennan via Discourse Meta at 26Jul2022 00:27:

[quote=“Cameron Simpson, post:11, topic:233499,
username:cameron-simpson”]
Mutt probably ignores the References
because it is a mail reader, and References originates in USENET news.
Maybe Thunderbird is using the references or augumenting the in-reply-to
with references information.

You only need to consult one of In=-Reply-To or References to do
threading; the former comes from email and the latter from USENET.
You’re supporting both (which is great!) so we need to make them
consistent.
[/quote]

This is also super interesting. I believe (without examining the source) Thunderbird does do that, and likely the Gmail UI as well since it does the same thing.

I think mutt will use both, but probably just In-Reply-To if present,
falling back to References. I’d need to check the source.

With References you do at least know the full chain to the OP; with
In-Reply-To you more or less need the antecedant messages around to
stitch things together. For mailing lists I usually keep the whole
thread locally until it’s done anyway, and I expect that is common.

We do seem to be doing this but I guess not consistently? Basically we need to make sure that:

  • TODO #1 - If a post has an associated IncomingEmail record, we always use that Message-ID when sending email.

Yes. This is why I was thinking it might be sanest to have an explicit
field for the message-id, and to fill it in once. Then use that from
then on always, regardless of any changes to the process in which the
message-id is manufactured in the code later.

  • TODO #2 - Do not use a References when sending out emails related to the OP of the topic .

Yes. The OP has no antecedant, so there’s no References or
In-Reply-To.

@cameron-simpson one question though – if the OP was created via an
inbound email, would we use that Message-ID in References for the
OP or still exclude it?

Still exclude. But use it as the persistent message-id for the OP.

So a message authored by email (OP or reply) gets its message-id from
the email. One authored on the web gets one when the user presses
Submit, generated by Discourse. From then on, that’s the message-id,
however created.

[quote=“Cameron Simpson, post:11, topic:233499,
username:cameron-simpson”]
You can just drop the sender_user_id and receiver_user_id from the
message-id field and get a single unique id which every receiver sees.

The uniqueness constraint is the post itself, not the individual
email-level “message” object.
[/quote]

This is interesting, I thought every recipient of the email had to have a unique Message-ID?

No. The message-id identifies the “message”. Not the individual copy. I
might post to the forum and CC someone directly. If that someone gets a
copy direct from me and also via the forum, they should have the same
message-id.

In fact I believe this is why we went down the path of adding
uniqueness to each recipient’s Message-ID, to avoid spam behaviours,
looking back on our internal topic. Perhaps @supermathie , who is on
our infra team and was doing a bunch of testing with email earlier in
the year, could weigh in here too?

Maybe. But on that face of it, threading is indeed broken. Certainly
sending the same message to many people should have the same message-id,
and generally, as a forwarder (email->discourse->email-recipients)
discourse shoud not be modifying the message-ids.

What you are saying is that it’s more that the post should be the thing determining a single Message-ID for all recipients. So perhaps we just generate one for each post that generates an email?

Every post should have one stable unique message-id for use in the email
side. If the post originated from an email, that original message-id
should be used. Otherwise (via the web interface) Discourse should be
generating a message-id and storing it with the post.

Then we could also move the IncomingEmail.message_id to here as well.

Sure. Having a distinct set of fields (message-id seems enough)
containing the email-side state should do it.

Tentatively, the change we would need to make is:

  • TODO #3 - **Add a outbound_message_id to the Post table. Generate
    it once when an email is first sent in relation to the post.

If you got the post from an email, you should be using that, not
generating a new one.

Use if for subsequent References and In-Reply-To headers. Set its
value when a post is created from an IncomingEmail.

Yes. To the message-id from the email.

Format should be
topic/:topic_id/:post_id/:random_alphanumeric_string@host e.g.
topic/233499/33545/gvy8475y7c45y87554c@meta.discourse.org**

For ones you generate yourselves, this looks good to me.

After this change my first example would become this:

  1. martin creates the OP
  • cameron is sent an email with these headers:
  • Message-ID: topic/233499/33545/gvy8475y7c45y87554c@meta.discourse.org
  • sam is sent an email with these headers:
  • Message-ID: topic/233499/33545/gvy8475y7c45y87554c@meta.discourse.org

Yes.

But note: the message-id only needs to be stable and unique. If the
topic/:topid_id/:post_id@host is stable and will never be regenerated,
that will do. But if you’re concerned about that (eg db restores or
migrations or imports bringing those same numbers) then the random
string will make it robust against collision.

Note that the message-id left part is dot-atom-text, defined here:

which is alphas and digits and a limited set of punctuation characters
(which includes “/”).

Um, your headers. They should have:

Message-ID: <topic/233499/33545/gvy8475y7c45y87554c@meta.discourse.org>

Note the angle brackets. The message-id is formally the bit between the
angle brackets, and the angle brackets are mandatory. Syntax here:

With the consideration also that the OP does not have special handling, it will no longer be in the format topic/:topic_id@hostname.

Sounds good.

  • TODO #4 - Ensure that correct In-Reply-To and References headers are generated based on PostReply records and the new outbound_message_id column on the Post table

Thanks.

I think we have some consideration for this, I will double-check.

+1

It definitely seems that way :sweat_smile:

Can you confirm the TODOs here sound reasonable Cameron?

They seem correct to me.

It really doesn’t seem like much now that I look at it. I also wonder,
when I get to this work would you be open to joining a testing
Discourse instance with me that will have the WIP changes deployed to
it so we can email back and forth and test that things are working
correctly? I will of course do testing of my own before I involve you.

Certainly. Happy to help in whatever way.

If not, that’s fine too – I have Thunderbird and will be setting up
mutt and I can test it all out there :slight_smile:

I can help you with mutt if you want it too.

3 个赞

我认为即使存在细微差异,您仍然可以使用相同的 message-id 发送不同的消息。

普通的邮件列表在不同程度上都会这样做。至少,总会发生一些报头处理。但消息正文有时也会被修改。一个典型的例子是 python-list,它会丢弃非文本附件。尽管如此,消息仍然会以相同的 message-id 通过。而且几乎所有的列表都会在底部添加一个附加信息,比如指向列表管理页面或取消订阅的链接。这在消息到达时是不存在的。

关于内容签名也曾有过长期的讨论,围绕着签名应该涵盖哪些内容。

因此,我完全赞成您添加特定收件人的取消订阅链接并保留原始 message-id。这样做的好处远远大于如果您为每条消息副本提供单独的 message-id 所带来的线程丢失。

再次考虑电子邮件用户。我可以回复一个 discourse 消息,并添加一个抄送给感兴趣的外部人员。也许他们会从 discourse 收到一份副本,也许不会。但如果他们收到了,即使有您附加的附加信息,它也应该带有源 message-id。否则,他们将收到我消息的 2 份副本,但他们的邮件系统不知道它们是同一条消息的副本。这会导致问题。

简而言之:我不认为您那微不足道的附加取消订阅文本值得使用不同的 message-id。只保留一个。

4 个赞

抱歉,我刚跟上进度,这里有一些想法,其中一些已经解决……

这里的难点在于,Discourse 发送出去的消息与传入的消息是不同的。它具有不同的元数据(为此目的,收件人/发件人/回复地址/退订/等)和不同的正文(我认为是为每个用户自定义的?邮件列表模式下不是这样吗?)。

消息到底是什么?根据 5322:

消息由报头字段组成,后面可以选择跟着消息正文。

“Message-ID:”字段提供了一个唯一的消息标识符,该标识符引用特定消息的特定版本

[我的重点]

正是“特定版本”让我认为重新发送具有不同消息 ID 的传入消息是不合适的。不过,如果你从 Discourse 是“论坛软件”的观点转变为 Discourse 是“邮件列表软件”的观点,那么这样做在某种程度上是有意义的,所以我理解你的想法。5322 还说:

消息被“更改”的情况很多,但这些更改并不构成该消息的新实例,因此消息不会获得新的消息标识符。例如,当消息被引入传输系统时,它们通常会被附加额外的报头字段,如跟踪字段(在 3.6.7 节中描述)和重发字段(在 3.6.6 节中描述)。添加这些报头字段不会改变消息的身份,因此保留了原始的“Message-ID:”字段。在所有情况下,决定“Message-ID:”字段是否更改的是发件人希望传达的含义(即,这是同一条消息还是不同的消息),而不是消息中出现的任何特定语法差异。

我想这归结为,当 Discourse 发送消息时,消息的发送者是否改变了?

也许我们应该使用 Resent-Message-ID 和相关字段?

它一直都在,一直追溯到 822。但正如你后来所说,是的,它已经被更新了。

5322 还直接谈到了 Discourse 和 Github 的使用方式:

“In-Reply-To:”字段可用于标识新消息所回复的消息,而“References:”字段可用于标识对话的“线程”。

可能有点不正确,可能是由于缺乏合适的“线程标识符”报头。但这可能不是 RFC 作者的意图……它没有处理没有“In-Reply-To”但有“References”的消息。

这其中的棘手之处在于,我们发送的不是一封电子邮件,而是发送 N 封——每位收件人一封——这样他们各自的元数据(退订等)就可以是正确的。

是的,我在测试中确实看到了强烈的迹象表明垃圾邮件的判定与 Message-ID 相关。如果稍后再次看到(同一用户不同用户),则更有可能被标记为垃圾邮件。

老实说,这里的最大好处是能够在某些邮件客户端中正确地将电子邮件线程化,但代价是可送达性。

当前的 topic/#{topic_id}/#{post_id}.s#{sender_user_id}r#{receiver_user_id} 至少在用户在收件箱中时使其保持一致。假设

最大的担忧是可送达性——当主要提供商没有任何可见性时,电子邮件的送达已经足够困难了。

但我确实看到了一个强有力的论据,即让 Discourse 在邮件列表模式下更像邮件列表软件@martin 我相信我们在邮件列表模式下不自定义消息正文吗?你认为在保留和重用 Message-ID方面采取更严格的方法是否明智

5 个赞

我不想陷入“完美是足够好的敌人”的境地。

我们现在在消息中使用“随机后缀”,这无疑会带来痛苦。

我们有 3 个选择:

  1. 无法追溯的随机消息 ID
  2. 每个主题/帖子/用户的稳定消息 ID
  3. 每个主题/帖子对的稳定消息 ID

我们目前处于(1)状态,这造成了混乱。

我担心我们会在(2)和(3)之间陷入决策瘫痪。

也许我们先从(2)开始,承认向 Discourse 发送的电子邮件添加额外的抄送可能会导致意外行为,至少可以停止这里的大部分痛苦?

4 个赞

啊!我以为我们已经在做:topic/#{topic_id}/#{post_id}.s#{sender_user_id}r#{receiver_user_id}

为了平衡电子邮件的唯一性与可送达性以及邮件列表模式的担忧,我倾向于为禁用的邮件列表模式执行 (2),为启用的邮件列表模式执行 (3)。

同样,对于 References 标头,我倾向于在主题的帖子 #1 中将其省略,并引用该主题(所以是 topic/#{topic_id})以及它所回复的帖子(如果有的话)。

3 个赞