SMTP failure not reported and too many login attempts on backlog clearing

No it’s right, it was working until 3 days ago for the past few months and it start working again after I restarted the server (just rebooted linux). I haven’t made a single change.

The bugs however remain, see my points 2 and 3 above. The problem is it’s trying to login before sending each email (which now has hundreds of backlogged emails because for some reason the emails stopped being sent a few days ago). So after sending about 30 emails or so the SMTP server blocks it because it re logs in before sending each email. Now I have to manually stop the discourse server, wait for 10 minutes and then restart it and then again it sends 30 emails and logs in 30 times and then again the smtp server blocks it for too many logins.
This isn’t correct, it should reuse an existing login to send emails and it should backoff if the smtp server responds with too many logins and report it to the administrator.

Plus there is the other issue of why it didn’t inform the admin that emails are not being sent. I think the email component/engine would have just stopped after an web upgrade causing all the emails to queue up and when I rebooted the machine it started sending them. There are no errors in the logs files at all until after the reboot when the smtp server pushed back after too many logins.

@tgxworld @eviltrout @codinghorror - anyone thoughts on why discourse is trying to authenticate for every eMail in the backlog queue and how to have it backoff when the server errors out?

I don’t see this as a bug, either a configuration issue or a feature request for better logging.

We are only doing what you tell us per your SMTP settings, if you set the password we try to authenticate.

2 Likes

How is this a logging issue?

If there are 200 emails in the backlog (a whole different issue as to why the email module stopped working creating a backlog), it shouldn’t authenticate 200 times. It should authenticate once and then send the 200 emails in a single authenticated session. It would be inefficient to authenticate, send one email, disconnect and do this over 200 times.

The other issue is that if the STMP is asking it to backoff it doesn’t but keeps hammering away at it, there should be a backoff algorithm to wait and then retry. I fail to see how these are logging issues.

Have you tried sending a text message from the admin /email page since this problem started?

Yes, that’s how I started to debug the issue, it didn’t send the eMail. Then I rebooted the server and email started working again (again no changes to any configuration) but now it started sending the 266 backlog emails and after the initial batch the STMP started throwing and error about too many logins and there where I figured out that Discourse was authenticating with the SMTP server for each individual email separately causing it to push back. I had to manually stop the server for 10 minutes then start it, it would send the next batch of 30-50 emails and then again the SMTP would push back, then I’d stop the server, wait 10 minutes and then start until the entire backlog was cleared.

I still fail to see how this has anything to do with logging. It’s an inefficient and possibly incorrect way to implement sending multiple emails.

That is indeed very strange. What mail server is it?

GSuite - Google Business

I tried to take a look at the code to see how emails are built and delivered. I didn’t find any specific place where the emails are “queued” up into a backlog and then delivered. It looks like each service/module sends emails independently.

However I couldn’t see any “queue”, so I’m just left wondering why discourse decided not to send any emails for 3 days and then to send 266 emails after rebooting the server. It’s almost like the notification system just went offline and then came back online after a reboot and iterated through all pending notifications from each module. Again I couldn’t find any “single” piece of code that does this.

Since it appears that each notification is send independent of the other, I guess there’s no way to “reuse” an authenticated SMTP connection. The crux of the code appears to be:

begin
        @message.deliver_now
rescue *SMTP_CLIENT_ERRORS => e
        return skip(e.message)
end

Given my rudimentary understanding here, there doesn’t appear to be any control over the SMTP connection.

If so, as there’s no way to control the SMTP connection,

  1. instead can there be a rate limit option provided in the eMail settings to not deliver more than X messages per minute or second? This would help where SMTP servers have rate limits and would help in situations like this when messages get backlogged (for whatever reason) and then suddenly cleared.
  2. alternatively can the email sender module process the SMTP error when it pushes back saying that it’s logged in too many times and back off before for X minutes before sending it again (again this could a configurable parameter in the settings)

@eviltrout looks like you seem to have a fair amount on the eMail notifications. Any thoughts on the above?

We have no support for email rate limiting and no plans to add it.

Okay so how do you handle this situation when discourse is sending hundreds of emails in a short span and the SMTP server is rejecting the multiple logins for each email?

Get a better SMTP server?

Google business is a perfectly legitimate service. It isn’t unreasonable to ask a client not to login 100 times in a second to send 100 emails.

Sounds like you’re blaming google for implementing reasonable DDOS protection mechanisms.

Isn’t there a way to for the email send to process the SMTP error through the skip method and handle it more gracefully?

10,000 sites are using this software and not having the trouble that you describe. My best guess is that you’re using a Google SMTP server that’s intended for a single user and not for delivering hundreds of messages per hour.

3 Likes

Fair enough. I guess when more folks start having the issue it can be looked into.

That only brings me back to why the messages weren’t sent for 3 days and I had to reboot the server to get it started again. I’ll keep an eye out if it happens again.

5 Likes

Even the GSuite for business gives 2000 emails per day Email sending limits - G Suite Admin Help which is probably not suitable for sending digest emails to a forum with many users

the alternatives

  • getting a paid smtp server
  • disabling digests

is there any others ?

I just wanted to chime in here… I have the same issue, but it’s more annoying than the OPs experience. I also use GSuite and have a separate “app password” configured for Discourse. I have noticed however the my problem is not due to email counts per day or SMTP rate limits. The forum this happened on was not doing hardly anything in terms of posts or digest emails or anything like that. What happens is, from time to time, an SMTP authentication fails… this has something to do with a bug on Google’s side of things… for a few minutes, once in a while (once ever 3-5 months anyway), SMTP authentications just fail for a few minutes. Normally, when I see this in my email client, I simply wait 5 minutes and try again and it’s magically working again. Well with Discourse, a failure just makes it mad… even with just a few messages queued up, it starts going into overdrive, attempting to log in over and over and over for each message in the queue. The result is an SMTP authentication blitz that triggers a different brute force attack protection mechanism within gmail, and very quickly it simply stops processing login attempts and errors instead. So, at this point it really doesn’t matter if the password is correct and the gmail authentication problem resolves… by this time you’re locked out because Discourse went bugnutz.

Forget rate limiting, instead focus on;

  • improving efficiency; send multiple emails on a single TCP connection for starters… this is standard and reduces active socket counts and network overhead.

  • implementing a sensible incriminating retry delay wherein if an SMTP authentication fails, there is really no reason to try again in such a short time period. When SMTP authentication fails, either disable email sending and put an on-screen alert to the admin that smtp authentication has failed, or put a serious delay mechanism in place such that try 2 occurs 60 seconds later, try 3 in 5 minutes, try 4 in 30 minutes, etc.

So now I have to basically change my SMTP settings to disable email, then rebuild the app, then wait 24 hours, revert my settings, then rebuild again. Even a simple checkbox in admin settings to disable email sending would be 1000% better than this.

There is such a setting. Search ‘disable email’.

See also Troubleshooting email on a new Discourse install

Great! I guess I expected to see it under the Email tab in the admin area, but in the Email section of the Settings tab makes sense also.

When I ran into this issue I had a look at the code and from what I could make out the way discourse is designed there isn’t any central queue or email service. Each job just uses a email object to send an email so there isn’t no way to reuse existing connections. If I’m correct this would require a redesign of discourse email to have a centralized email system/queue management to make things more efficient, have a retry mechanism, backoff delays etc.

It would be nice to have those features and infact I think it would be a huge plus to have a centralized email service manager with queue management but that would require redesign and some work.

2 Likes