I noticed for the past few days no emails were being sent for posts on the forum. I checked the dashboard and found 0 emails for the past 3 days. No idea why, I had to reboot the server and then it started sending emails but here are a few issues I am noticing:
There was no notification provided on the admin account that it’s having issues sending emails, even the logs didn’t show any errors
After the reboot it starting sending the backlog of emails but the problem is after sending the initial batch of 100 emails now I’m getting an error Job exception: 454 4.7.0 Too many login attempts, please try again later and it’s stuck in that loop continuously trying to login to sending the remaining emails but the server is rejecting it.
It looks like for EACH email it’s trying to login to the SMTP server. This IMHO is a bug, it should not login to the server for each email when there is more than one email to send, it should reuse the existing connection.
Now it’s stuck in this trying to login loop even few seconds, how do I stop it and ask it to back off for 10-15 minutes and try again?
No it’s right, it was working until 3 days ago for the past few months and it start working again after I restarted the server (just rebooted linux). I haven’t made a single change.
The bugs however remain, see my points 2 and 3 above. The problem is it’s trying to login before sending each email (which now has hundreds of backlogged emails because for some reason the emails stopped being sent a few days ago). So after sending about 30 emails or so the SMTP server blocks it because it re logs in before sending each email. Now I have to manually stop the discourse server, wait for 10 minutes and then restart it and then again it sends 30 emails and logs in 30 times and then again the smtp server blocks it for too many logins.
This isn’t correct, it should reuse an existing login to send emails and it should backoff if the smtp server responds with too many logins and report it to the administrator.
Plus there is the other issue of why it didn’t inform the admin that emails are not being sent. I think the email component/engine would have just stopped after an web upgrade causing all the emails to queue up and when I rebooted the machine it started sending them. There are no errors in the logs files at all until after the reboot when the smtp server pushed back after too many logins.
@tgxworld@eviltrout@codinghorror - anyone thoughts on why discourse is trying to authenticate for every eMail in the backlog queue and how to have it backoff when the server errors out?
If there are 200 emails in the backlog (a whole different issue as to why the email module stopped working creating a backlog), it shouldn’t authenticate 200 times. It should authenticate once and then send the 200 emails in a single authenticated session. It would be inefficient to authenticate, send one email, disconnect and do this over 200 times.
The other issue is that if the STMP is asking it to backoff it doesn’t but keeps hammering away at it, there should be a backoff algorithm to wait and then retry. I fail to see how these are logging issues.
Yes, that’s how I started to debug the issue, it didn’t send the eMail. Then I rebooted the server and email started working again (again no changes to any configuration) but now it started sending the 266 backlog emails and after the initial batch the STMP started throwing and error about too many logins and there where I figured out that Discourse was authenticating with the SMTP server for each individual email separately causing it to push back. I had to manually stop the server for 10 minutes then start it, it would send the next batch of 30-50 emails and then again the SMTP would push back, then I’d stop the server, wait 10 minutes and then start until the entire backlog was cleared.
I still fail to see how this has anything to do with logging. It’s an inefficient and possibly incorrect way to implement sending multiple emails.
I tried to take a look at the code to see how emails are built and delivered. I didn’t find any specific place where the emails are “queued” up into a backlog and then delivered. It looks like each service/module sends emails independently.
However I couldn’t see any “queue”, so I’m just left wondering why discourse decided not to send any emails for 3 days and then to send 266 emails after rebooting the server. It’s almost like the notification system just went offline and then came back online after a reboot and iterated through all pending notifications from each module. Again I couldn’t find any “single” piece of code that does this.
Since it appears that each notification is send independent of the other, I guess there’s no way to “reuse” an authenticated SMTP connection. The crux of the code appears to be:
begin
@message.deliver_now
rescue *SMTP_CLIENT_ERRORS => e
return skip(e.message)
end
Given my rudimentary understanding here, there doesn’t appear to be any control over the SMTP connection.
If so, as there’s no way to control the SMTP connection,
instead can there be a rate limit option provided in the eMail settings to not deliver more than X messages per minute or second? This would help where SMTP servers have rate limits and would help in situations like this when messages get backlogged (for whatever reason) and then suddenly cleared.
alternatively can the email sender module process the SMTP error when it pushes back saying that it’s logged in too many times and back off before for X minutes before sending it again (again this could a configurable parameter in the settings)
@eviltrout looks like you seem to have a fair amount on the eMail notifications. Any thoughts on the above?
Okay so how do you handle this situation when discourse is sending hundreds of emails in a short span and the SMTP server is rejecting the multiple logins for each email?
10,000 sites are using this software and not having the trouble that you describe. My best guess is that you’re using a Google SMTP server that’s intended for a single user and not for delivering hundreds of messages per hour.
Fair enough. I guess when more folks start having the issue it can be looked into.
That only brings me back to why the messages weren’t sent for 3 days and I had to reboot the server to get it started again. I’ll keep an eye out if it happens again.
Even the GSuite for business gives 2000 emails per day Email sending limits - G Suite Admin Help which is probably not suitable for sending digest emails to a forum with many users
I just wanted to chime in here… I have the same issue, but it’s more annoying than the OPs experience. I also use GSuite and have a separate “app password” configured for Discourse. I have noticed however the my problem is not due to email counts per day or SMTP rate limits. The forum this happened on was not doing hardly anything in terms of posts or digest emails or anything like that. What happens is, from time to time, an SMTP authentication fails… this has something to do with a bug on Google’s side of things… for a few minutes, once in a while (once ever 3-5 months anyway), SMTP authentications just fail for a few minutes. Normally, when I see this in my email client, I simply wait 5 minutes and try again and it’s magically working again. Well with Discourse, a failure just makes it mad… even with just a few messages queued up, it starts going into overdrive, attempting to log in over and over and over for each message in the queue. The result is an SMTP authentication blitz that triggers a different brute force attack protection mechanism within gmail, and very quickly it simply stops processing login attempts and errors instead. So, at this point it really doesn’t matter if the password is correct and the gmail authentication problem resolves… by this time you’re locked out because Discourse went bugnutz.
Forget rate limiting, instead focus on;
improving efficiency; send multiple emails on a single TCP connection for starters… this is standard and reduces active socket counts and network overhead.
implementing a sensible incriminating retry delay wherein if an SMTP authentication fails, there is really no reason to try again in such a short time period. When SMTP authentication fails, either disable email sending and put an on-screen alert to the admin that smtp authentication has failed, or put a serious delay mechanism in place such that try 2 occurs 60 seconds later, try 3 in 5 minutes, try 4 in 30 minutes, etc.
So now I have to basically change my SMTP settings to disable email, then rebuild the app, then wait 24 hours, revert my settings, then rebuild again. Even a simple checkbox in admin settings to disable email sending would be 1000% better than this.