SMTP failure not reported and too many login attempts on backlog clearing

I noticed for the past few days no emails were being sent for posts on the forum. I checked the dashboard and found 0 emails for the past 3 days. No idea why, I had to reboot the server and then it started sending emails but here are a few issues I am noticing:

  1. There was no notification provided on the admin account that it’s having issues sending emails, even the logs didn’t show any errors
  2. After the reboot it starting sending the backlog of emails but the problem is after sending the initial batch of 100 emails now I’m getting an error Job exception: 454 4.7.0 Too many login attempts, please try again later and it’s stuck in that loop continuously trying to login to sending the remaining emails but the server is rejecting it.
    It looks like for EACH email it’s trying to login to the SMTP server. This IMHO is a bug, it should not login to the server for each email when there is more than one email to send, it should reuse the existing connection.
  3. Now it’s stuck in this trying to login loop even few seconds, how do I stop it and ask it to back off for 10-15 minutes and try again?

My guess is that your login credentials are wrong and that’s why it’s continuing to try to log in.

No it’s right, it was working until 3 days ago for the past few months and it start working again after I restarted the server (just rebooted linux). I haven’t made a single change.

The bugs however remain, see my points 2 and 3 above. The problem is it’s trying to login before sending each email (which now has hundreds of backlogged emails because for some reason the emails stopped being sent a few days ago). So after sending about 30 emails or so the SMTP server blocks it because it re logs in before sending each email. Now I have to manually stop the discourse server, wait for 10 minutes and then restart it and then again it sends 30 emails and logs in 30 times and then again the smtp server blocks it for too many logins.
This isn’t correct, it should reuse an existing login to send emails and it should backoff if the smtp server responds with too many logins and report it to the administrator.

Plus there is the other issue of why it didn’t inform the admin that emails are not being sent. I think the email component/engine would have just stopped after an web upgrade causing all the emails to queue up and when I rebooted the machine it started sending them. There are no errors in the logs files at all until after the reboot when the smtp server pushed back after too many logins.

@tgxworld @eviltrout @codinghorror - anyone thoughts on why discourse is trying to authenticate for every eMail in the backlog queue and how to have it backoff when the server errors out?

I don’t see this as a bug, either a configuration issue or a feature request for better logging.

We are only doing what you tell us per your SMTP settings, if you set the password we try to authenticate.

How is this a logging issue?

If there are 200 emails in the backlog (a whole different issue as to why the email module stopped working creating a backlog), it shouldn’t authenticate 200 times. It should authenticate once and then send the 200 emails in a single authenticated session. It would be inefficient to authenticate, send one email, disconnect and do this over 200 times.

The other issue is that if the STMP is asking it to backoff it doesn’t but keeps hammering away at it, there should be a backoff algorithm to wait and then retry. I fail to see how these are logging issues.

Have you tried sending a text message from the admin /email page since this problem started?

Yes, that’s how I started to debug the issue, it didn’t send the eMail. Then I rebooted the server and email started working again (again no changes to any configuration) but now it started sending the 266 backlog emails and after the initial batch the STMP started throwing and error about too many logins and there where I figured out that Discourse was authenticating with the SMTP server for each individual email separately causing it to push back. I had to manually stop the server for 10 minutes then start it, it would send the next batch of 30-50 emails and then again the SMTP would push back, then I’d stop the server, wait 10 minutes and then start until the entire backlog was cleared.

I still fail to see how this has anything to do with logging. It’s an inefficient and possibly incorrect way to implement sending multiple emails.

That is indeed very strange. What mail server is it?

GSuite - Google Business

I tried to take a look at the code to see how emails are built and delivered. I didn’t find any specific place where the emails are “queued” up into a backlog and then delivered. It looks like each service/module sends emails independently.

However I couldn’t see any “queue”, so I’m just left wondering why discourse decided not to send any emails for 3 days and then to send 266 emails after rebooting the server. It’s almost like the notification system just went offline and then came back online after a reboot and iterated through all pending notifications from each module. Again I couldn’t find any “single” piece of code that does this.

Since it appears that each notification is send independent of the other, I guess there’s no way to “reuse” an authenticated SMTP connection. The crux of the code appears to be:

begin
        @message.deliver_now
rescue *SMTP_CLIENT_ERRORS => e
        return skip(e.message)
end

Given my rudimentary understanding here, there doesn’t appear to be any control over the SMTP connection.

If so, as there’s no way to control the SMTP connection,

  1. instead can there be a rate limit option provided in the eMail settings to not deliver more than X messages per minute or second? This would help where SMTP servers have rate limits and would help in situations like this when messages get backlogged (for whatever reason) and then suddenly cleared.
  2. alternatively can the email sender module process the SMTP error when it pushes back saying that it’s logged in too many times and back off before for X minutes before sending it again (again this could a configurable parameter in the settings)

@eviltrout looks like you seem to have a fair amount on the eMail notifications. Any thoughts on the above?

We have no support for email rate limiting and no plans to add it.

Okay so how do you handle this situation when discourse is sending hundreds of emails in a short span and the SMTP server is rejecting the multiple logins for each email?

Get a better SMTP server?

Google business is a perfectly legitimate service. It isn’t unreasonable to ask a client not to login 100 times in a second to send 100 emails.

Sounds like you’re blaming google for implementing reasonable DDOS protection mechanisms.

Isn’t there a way to for the email send to process the SMTP error through the skip method and handle it more gracefully?

10,000 sites are using this software and not having the trouble that you describe. My best guess is that you’re using a Google SMTP server that’s intended for a single user and not for delivering hundreds of messages per hour.

Fair enough. I guess when more folks start having the issue it can be looked into.

That only brings me back to why the messages weren’t sent for 3 days and I had to reboot the server to get it started again. I’ll keep an eye out if it happens again.

Even the GSuite for business gives 2000 emails per day Email sending limits - G Suite Admin Help which is probably not suitable for sending digest emails to a forum with many users

the alternatives

  • getting a paid smtp server
  • disabling digests

is there any others ?

我想插一句……我遇到了同样的问题,但情况比原帖作者的经历更令人恼火。我也在使用 GSuite,并为 Discourse 配置了单独的“应用密码”。不过我注意到,我的问题并非由每日邮件数量或 SMTP 速率限制引起。发生该问题的论坛在发帖、摘要邮件等方面几乎没有任何活动。实际情况是,偶尔会出现 SMTP 认证失败的情况……这与 Google 方面的一个缺陷有关:每隔一段时间(大约每 3 到 5 个月一次),SMTP 认证会在几分钟内失败。通常,当我的邮件客户端遇到这种情况时,我只需等待 5 分钟再试,一切便神奇地恢复正常。但在 Discourse 中,一次失败就会让它陷入疯狂:即使队列中只有几条消息,它也会开始过度反应,反复尝试为队列中的每条消息登录。结果就是触发了一轮 SMTP 认证“闪电战”,进而激活了 Gmail 内部的另一种暴力攻击防护机制,很快 Gmail 就会停止处理登录尝试并报错。因此,此时无论密码是否正确,也无论 Gmail 的认证问题是否已解决,都无济于事——因为 Discourse 已经彻底失控,导致你被锁定。

别再只关注速率限制了,不如把重点放在以下方面:

  • 提升效率:例如,在单个 TCP 连接上发送多封邮件。这是标准做法,可减少活跃套接字数量和网络开销。
  • 实施合理的递增重试延迟机制:如果 SMTP 认证失败,完全没有必要在如此短的时间内再次尝试。当 SMTP 认证失败时,要么禁用邮件发送并在管理员界面显示警告,提示 SMTP 认证失败;要么设置一个严肃的延迟机制,例如第二次尝试在 60 秒后进行,第三次在 5 分钟后,第四次在 30 分钟后,依此类推。

目前,我不得不基本更改 SMTP 设置以禁用邮件,然后重新构建应用,等待 24 小时,再恢复设置,最后再次重新构建。即使在管理员设置中提供一个简单的复选框来禁用邮件发送,也比现在的情况好上一千倍。

确实有这样一个设置。搜索“禁用邮件”。

另请参阅 Troubleshoot email on a new Discourse install - #362