I noticed for the past few days no emails were being sent for posts on the forum. I checked the dashboard and found 0 emails for the past 3 days. No idea why, I had to reboot the server and then it started sending emails but here are a few issues I am noticing:
There was no notification provided on the admin account that it’s having issues sending emails, even the logs didn’t show any errors
After the reboot it starting sending the backlog of emails but the problem is after sending the initial batch of 100 emails now I’m getting an error Job exception: 454 4.7.0 Too many login attempts, please try again later and it’s stuck in that loop continuously trying to login to sending the remaining emails but the server is rejecting it.
It looks like for EACH email it’s trying to login to the SMTP server. This IMHO is a bug, it should not login to the server for each email when there is more than one email to send, it should reuse the existing connection.
Now it’s stuck in this trying to login loop even few seconds, how do I stop it and ask it to back off for 10-15 minutes and try again?
No it’s right, it was working until 3 days ago for the past few months and it start working again after I restarted the server (just rebooted linux). I haven’t made a single change.
The bugs however remain, see my points 2 and 3 above. The problem is it’s trying to login before sending each email (which now has hundreds of backlogged emails because for some reason the emails stopped being sent a few days ago). So after sending about 30 emails or so the SMTP server blocks it because it re logs in before sending each email. Now I have to manually stop the discourse server, wait for 10 minutes and then restart it and then again it sends 30 emails and logs in 30 times and then again the smtp server blocks it for too many logins.
This isn’t correct, it should reuse an existing login to send emails and it should backoff if the smtp server responds with too many logins and report it to the administrator.
Plus there is the other issue of why it didn’t inform the admin that emails are not being sent. I think the email component/engine would have just stopped after an web upgrade causing all the emails to queue up and when I rebooted the machine it started sending them. There are no errors in the logs files at all until after the reboot when the smtp server pushed back after too many logins.
@tgxworld@eviltrout@codinghorror - anyone thoughts on why discourse is trying to authenticate for every eMail in the backlog queue and how to have it backoff when the server errors out?
If there are 200 emails in the backlog (a whole different issue as to why the email module stopped working creating a backlog), it shouldn’t authenticate 200 times. It should authenticate once and then send the 200 emails in a single authenticated session. It would be inefficient to authenticate, send one email, disconnect and do this over 200 times.
The other issue is that if the STMP is asking it to backoff it doesn’t but keeps hammering away at it, there should be a backoff algorithm to wait and then retry. I fail to see how these are logging issues.
Yes, that’s how I started to debug the issue, it didn’t send the eMail. Then I rebooted the server and email started working again (again no changes to any configuration) but now it started sending the 266 backlog emails and after the initial batch the STMP started throwing and error about too many logins and there where I figured out that Discourse was authenticating with the SMTP server for each individual email separately causing it to push back. I had to manually stop the server for 10 minutes then start it, it would send the next batch of 30-50 emails and then again the SMTP would push back, then I’d stop the server, wait 10 minutes and then start until the entire backlog was cleared.
I still fail to see how this has anything to do with logging. It’s an inefficient and possibly incorrect way to implement sending multiple emails.
I tried to take a look at the code to see how emails are built and delivered. I didn’t find any specific place where the emails are “queued” up into a backlog and then delivered. It looks like each service/module sends emails independently.
However I couldn’t see any “queue”, so I’m just left wondering why discourse decided not to send any emails for 3 days and then to send 266 emails after rebooting the server. It’s almost like the notification system just went offline and then came back online after a reboot and iterated through all pending notifications from each module. Again I couldn’t find any “single” piece of code that does this.
Since it appears that each notification is send independent of the other, I guess there’s no way to “reuse” an authenticated SMTP connection. The crux of the code appears to be:
begin
@message.deliver_now
rescue *SMTP_CLIENT_ERRORS => e
return skip(e.message)
end
Given my rudimentary understanding here, there doesn’t appear to be any control over the SMTP connection.
If so, as there’s no way to control the SMTP connection,
instead can there be a rate limit option provided in the eMail settings to not deliver more than X messages per minute or second? This would help where SMTP servers have rate limits and would help in situations like this when messages get backlogged (for whatever reason) and then suddenly cleared.
alternatively can the email sender module process the SMTP error when it pushes back saying that it’s logged in too many times and back off before for X minutes before sending it again (again this could a configurable parameter in the settings)
@eviltrout looks like you seem to have a fair amount on the eMail notifications. Any thoughts on the above?
Okay so how do you handle this situation when discourse is sending hundreds of emails in a short span and the SMTP server is rejecting the multiple logins for each email?
10,000 sites are using this software and not having the trouble that you describe. My best guess is that you’re using a Google SMTP server that’s intended for a single user and not for delivering hundreds of messages per hour.
Fair enough. I guess when more folks start having the issue it can be looked into.
That only brings me back to why the messages weren’t sent for 3 days and I had to reboot the server to get it started again. I’ll keep an eye out if it happens again.
Even the GSuite for business gives 2000 emails per day Email sending limits - G Suite Admin Help which is probably not suitable for sending digest emails to a forum with many users
Je voulais juste intervenir ici… J’ai le même problème, mais c’est encore plus ennuyeux que l’expérience de l’OP. J’utilise également GSuite et j’ai configuré un « mot de passe d’application » distinct pour Discourse. J’ai cependant remarqué que mon problème ne provient pas du nombre d’e-mails par jour ou des limites de débit SMTP. Le forum où cela s’est produit ne faisait pratiquement rien en termes de publications, d’e-mails de résumé ou d’autre chose. Ce qui se passe, c’est que, de temps en temps, une authentification SMTP échoue… cela a quelque chose à voir avec un bug du côté de Google… de temps en temps, pendant quelques minutes (une fois tous les 3 à 5 mois environ), les authentifications SMTP échouent simplement pendant quelques minutes. Normalement, lorsque je vois cela dans mon client de messagerie, j’attends simplement 5 minutes et je réessaie, et tout fonctionne comme par magie. Eh bien, avec Discourse, un échec le met en colère… même avec seulement quelques messages en file d’attente, il commence à s’emballer, tentant de se connecter encore et encore et encore pour chaque message dans la file d’attente. Le résultat est une rafale d’authentification SMTP qui déclenche un autre mécanisme de protection contre les attaques par force brute dans Gmail, et très rapidement, il cesse simplement de traiter les tentatives de connexion et renvoie des erreurs. Donc, à ce stade, peu importe que le mot de passe soit correct et que le problème d’authentification Gmail soit résolu… à ce moment-là, vous êtes verrouillé car Discourse a perdu la tête.
Oubliez la limitation de débit, concentrez-vous plutôt sur :
l’amélioration de l’efficacité ; envoyez plusieurs e-mails sur une seule connexion TCP pour commencer… c’est standard et cela réduit le nombre de sockets actifs et la surcharge réseau.
la mise en œuvre d’un délai de réessai incrémental sensé, où si une authentification SMTP échoue, il n’y a vraiment aucune raison de réessayer dans un délai aussi court. Lorsqu’une authentification SMTP échoue, soit désactivez l’envoi d’e-mails et affichez une alerte à l’écran à l’administrateur indiquant que l’authentification SMTP a échoué, soit mettez en place un mécanisme de délai sérieux tel que la tentative 2 se produit 60 secondes plus tard, la tentative 3 dans 5 minutes, la tentative 4 dans 30 minutes, etc.
Donc maintenant, je dois essentiellement modifier mes paramètres SMTP pour désactiver l’envoi d’e-mails, puis reconstruire l’application, attendre 24 heures, revenir à mes paramètres, puis reconstruire à nouveau. Même une simple case à cocher dans les paramètres d’administration pour désactiver l’envoi d’e-mails serait 1000 % mieux que cela.