Emails have stopped sending - end of file reached

Hi all, and apologies if this is similar to some of the other posts that mention this error.

For the last four days all emails have stopped getting sent, and the test email also fails.

I have browsed through existing topics that are similar, but in my case nothing (that I am aware of) has changed, and email has stopped working after previously working for months without issue.

We are using hosting on Digital Ocean, and are using G Suite SMTP relay configured to relay emails from the droplet’s IP address.

The exact error listed in Sidekiq is a bit more verbose than what I get from the discourse-doctor.

Jobs::HandledExceptionWrapper: Wrapped EOFError: end of file reached

discource-doctor merely says: UNEXPECTED ERROR: end of file reached

I was also able to confirm connecting to the server with:

telnet smtp-relay.gmail.com 587

I believe there was one other short lapse where e-mails stopped sending many months ago, but I can’t remember the error (at that time I was able to retry from sidekiq without any issues).

Has anyone experienced anything similar or have a similar configuration that is still functional? Thanks in advance!

2 Likes

I have no useful advice yet, but have run into exactly the same issue, with exactly the same setup - DigitalOcean droplet, sending email via smtp-relay.gmail.com, getting EOFErrors.

Sidekiq reports the following:

Jobs::HandledExceptionWrapper: Wrapped EOFError: end of file reached

Looking in /logs, I get a traceback on the failure, but nothing immediately standing out as useful.

Info:

Job exception: end of file reached

Backtrace:

/usr/local/lib/ruby/2.7.0/net/protocol.rb:225:in `rbuf_fill'
/usr/local/lib/ruby/2.7.0/net/protocol.rb:191:in `readuntil'
/usr/local/lib/ruby/2.7.0/net/protocol.rb:201:in `readline'
/usr/local/lib/ruby/2.7.0/net/smtp.rb:944:in `recv_response'
/usr/local/lib/ruby/2.7.0/net/smtp.rb:929:in `block in getok'
/usr/local/lib/ruby/2.7.0/net/smtp.rb:954:in `critical'
/usr/local/lib/ruby/2.7.0/net/smtp.rb:927:in `getok'
/usr/local/lib/ruby/2.7.0/net/smtp.rb:826:in `helo'
/usr/local/lib/ruby/2.7.0/net/smtp.rb:600:in `do_helo'
/usr/local/lib/ruby/2.7.0/net/smtp.rb:554:in `do_start'
/usr/local/lib/ruby/2.7.0/net/smtp.rb:518:in `start'
mail-2.7.1/lib/mail/network/delivery_methods/smtp.rb:109:in `start_smtp_session'
mail-2.7.1/lib/mail/network/delivery_methods/smtp.rb:100:in `deliver!'
mail-2.7.1/lib/mail/message.rb:2159:in `do_delivery'
mail-2.7.1/lib/mail/message.rb:260:in `block in deliver'
actionmailer-6.0.3.3/lib/action_mailer/base.rb:589:in `block in deliver_mail'
activesupport-6.0.3.3/lib/active_support/notifications.rb:180:in `block in instrument'
activesupport-6.0.3.3/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
activesupport-6.0.3.3/lib/active_support/notifications.rb:180:in `instrument'
actionmailer-6.0.3.3/lib/action_mailer/base.rb:587:in `deliver_mail'
mail-2.7.1/lib/mail/message.rb:260:in `deliver'
actionmailer-6.0.3.3/lib/action_mailer/message_delivery.rb:115:in `block in deliver_now'
actionmailer-6.0.3.3/lib/action_mailer/rescuable.rb:17:in `handle_exceptions'
actionmailer-6.0.3.3/lib/action_mailer/message_delivery.rb:114:in `deliver_now'
/var/www/discourse/lib/email/sender.rb:234:in `send'
/var/www/discourse/app/jobs/regular/user_email.rb:70:in `send_user_email'
/var/www/discourse/app/jobs/regular/user_email.rb:25:in `execute'
/var/www/discourse/app/jobs/base.rb:232:in `block (2 levels) in perform'
rails_multisite-2.5.0/lib/rails_multisite/connection_management.rb:76:in `with_connection'
/var/www/discourse/app/jobs/base.rb:221:in `block in perform'
/var/www/discourse/app/jobs/base.rb:217:in `each'
/var/www/discourse/app/jobs/base.rb:217:in `perform'
sidekiq-6.1.2/lib/sidekiq/processor.rb:196:in `execute_job'
sidekiq-6.1.2/lib/sidekiq/processor.rb:164:in `block (2 levels) in process'
sidekiq-6.1.2/lib/sidekiq/middleware/chain.rb:138:in `block in invoke'
/var/www/discourse/lib/sidekiq/pausable.rb:138:in `call'
sidekiq-6.1.2/lib/sidekiq/middleware/chain.rb:140:in `block in invoke'
sidekiq-6.1.2/lib/sidekiq/middleware/chain.rb:143:in `invoke'
sidekiq-6.1.2/lib/sidekiq/processor.rb:163:in `block in process'
sidekiq-6.1.2/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/job_retry.rb:111:in `local'
sidekiq-6.1.2/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq.rb:38:in `block in <module:Sidekiq>'
sidekiq-6.1.2/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/processor.rb:257:in `stats'
sidekiq-6.1.2/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/job_logger.rb:13:in `call'
sidekiq-6.1.2/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/job_retry.rb:78:in `global'
sidekiq-6.1.2/lib/sidekiq/processor.rb:124:in `block in dispatch'
sidekiq-6.1.2/lib/sidekiq/logger.rb:10:in `with'
sidekiq-6.1.2/lib/sidekiq/job_logger.rb:33:in `prepare'
sidekiq-6.1.2/lib/sidekiq/processor.rb:123:in `dispatch'
sidekiq-6.1.2/lib/sidekiq/processor.rb:162:in `process'
sidekiq-6.1.2/lib/sidekiq/processor.rb:78:in `process_one'
sidekiq-6.1.2/lib/sidekiq/processor.rb:68:in `run'
sidekiq-6.1.2/lib/sidekiq/util.rb:15:in `watchdog'
sidekiq-6.1.2/lib/sidekiq/util.rb:24:in `block in safe_thread'

Env:

hostname	conversation-app
process_id	736
application_version	e6bbe9b5df4d86fe711aa8b1d886489d30875633
current_db	default
current_hostname	conversation.sevarg.net
job	Jobs::UserEmail
problem_db	default
time	12:42 pm
opts	
type	digest
user_id	30
current_site_id	default

discourse-doctor has the same general output:

==================== MAIL TEST ====================
For a robust test, get an address from http://www.mail-tester.com/
Or just send a test message to yourself.
Email address for mail test? ('n' to skip) [[my email]]: 
Sending mail to [my email]. . . 
Testing sending to [my email] using smtp-relay.gmail.com:587.
======================================== ERROR ========================================
                                    UNEXPECTED ERROR

end of file reached

====================================== SOLUTION =======================================
This is not a common error. No recommended solution exists!

Please report the exact error message above to https://meta.discourse.org/
(And a solution, if you find one!)
=======================================================================================

I also can telnet to the relay on port 587 (and send a test message by hand - haven’t done that in a decade…), and I’ve not changed anything I can think of in recent history that would have impacted mail.

I’m pretty well dead in the water in terms of new users and such, which is a bit of a problem as I’m using it for blog comments as well. I’ve found nothing in the Google logs that are terribly helpful either, and I’m well and truly out of ideas to continue troubleshooting. Everything seems to be configured properly, but things just no longer work.

3 Likes

Well, it’s definitely a comfort to know that my set up is not too uncommon, and that I am not alone in my woes. Curious, did the problem start about 5 days ago for you as well? Maybe there was an update to something common in our pipelines.

Thanks for sharing the details, and back trace. Mine were very similar to yours, and the errors were identical.

I didn’t attempt to manually send an email from telnet, but I suspect it would work as it did for you…

I’m in the same boat, and we’re manually activating new users for the time being (thankfully that’s only a few each day). Considering I hadn’t changed anything in g-suite, digital ocean, or the discourse configuration I’m hesitant to start changing anything without being able to narrow down what’s actually causing the issue. :confused:

1 Like

The first real spike in failures in Sidekiq was Jan 14, so… 5 days ago. Before that, I had some random failures related to bad emails or such, but nothing ticking up rapidly.

I tried recreating the relay settings in the Google Admin Console and fiddling with those (including what should be wide open) with no changes. I tried some different ports for sending mail with no changes either.

I also didn’t change anything I’m aware of 5 days ago. :confused:

Another report of issues, DigitalOcean → smtp-relay.gmail.com

Is anyone easily able to test from a non-DigitalOcean VM? GCE or something?

I just fired up a Discourse install on GCE, with my credentials, and got the same error (having set up my relay to just rely on authentication).

======================================== ERROR ========================================
                                    UNEXPECTED ERROR

end of file reached

====================================== SOLUTION =======================================
This is not a common error. No recommended solution exists!

Please report the exact error message above to https://meta.discourse.org/
(And a solution, if you find one!)
=======================================================================================

Setting up IP based authentication for the relay gave the same results. So I don’t think it’s a DigitalOcean specific issue…

Unfortunately, “Troubleshooting Ruby/Rails email issues” is beyond my current skillset… any suggestions?

Is there any chance this is a Gmail SMTP issue?

Seems like it. I don’t know how to troubleshoot it, and my attempts to fix it so far have gone nowhere. They likely changed something, Discourse can’t handle it, and there’s of course no support.

I’ve had good luck on these forums before helping track down and solve issues. Not sure why this one is so quiet.

It’s possible that it’s an gmail/gsuite smtp issue, but @Syonyk mentioned he was able to manually send an email through telnet on his droplet.

I’m not nearly experienced enough to know how g-suite might interpret traffic dispatched from the site compared to manually sent message, but that seems to make it seem like it’s an issue with the service sending the email to smtp-relay.gmail, and not on the relay itself.

Fwiw, I also have the droplet’s IP address specifically allowed in the admin gsuite settings, and I had not (and still have not) changed any settings in any of the services for several months.

The one time I saw something similar happen it was only a day (maybe two – it wasn’t a very busy page at the time so I probably wouldn’t have noticed if it was longer), but it seemed to have resolved itself pretty quickly.

Without a good trace of the SMTP conversation from Discourse, I don’t know how to troubleshoot any further - and I don’t know how to get those traces.

Is there any way to confirm the number of emails I’m sending out of discourse per month? If I need to go to another smtp-relay I’d need to know what sort of budget I’d need. This is super frustrating.

Under /admin/email/sent on your instance, you should be able to see what’s been sent and estimate usage from there.

1 Like

Hm…

I tossed a tcpdump on the server and ran discourse-doctor. And found this in the output…

...
0x0030:  d10f f8e4 4548 4c4f 206c 6f63 616c 686f  ....EHLO.localho
	0x0040:  7374 0d0a                                st..
...
	0x0030:  de62 f0c3 3432 3120 342e 372e 3020 5472  .b..421.4.7.0.Tr
	0x0040:  7920 6167 6169 6e20 6c61 7465 722c 2063  y.again.later,.c
	0x0050:  6c6f 7369 6e67 2063 6f6e 6e65 6374 696f  losing.connectio
	0x0060:  6e2e 2028 4548 4c4f 2920 6a31 3673 6d34  n..(EHLO).j16sm4
	0x0070:  3831 3932 3976 736d 2e31 202d 2067 736d  81929vsm.1.-.gsm
	0x0080:  7470 0d0a                                tp..

And, importantly, I CAN reproduce this failure with telnet.

root@conversation:~# telnet smtp-relay.gmail.com 587
Trying 74.125.137.28...
Connected to smtp-relay.gmail.com.
Escape character is '^]'.
220 smtp-relay.gmail.com ESMTP ls8sm507258pjb.6 - gsmtp
ehlo localhost.localdomain
421 4.7.0 Try again later, closing connection. (EHLO) ls8sm507258pjb.6 - gsmtp
Connection closed by foreign host.

If I send an actual domain, I get the expected response.

root@conversation:~# telnet smtp-relay.gmail.com 587
Trying 74.125.137.28...
Connected to smtp-relay.gmail.com.
Escape character is '^]'.
220 smtp-relay.gmail.com ESMTP p10sm668563uaw.3 - gsmtp
ehlo conversation.sevarg.net
250-smtp-relay.gmail.com at your service, [64.227.96.27]
250-SIZE 157286400
250-8BITMIME
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8

So, now the question is, how does one get Discourse to send a proper domain string in the ehlo?

I don’t know if it’s the only issue, but it sure looks promising to run down.

1 Like

This is so strange. Where would this have suddenly crept in? I haven’t done any updates.

It hasn’t crept in, it’s been like this. Google changed something.

discourse-doctor calls the test in /var/www/discourse/lib/tasks/emails.rake - if you’re in the image.

I changed:

Net::SMTP.start(smtp[:address], smtp[:port], 'localhost', smtp[:user_name], smtp[:password], smtp[:authentication])

to

Net::SMTP.start(smtp[:address], smtp[:port], 'conversation.sevarg.net', smtp[:user_name], smtp[:password], smtp[:authentication])

Now I get a different error.

======================================== ERROR ========================================
                                    UNEXPECTED ERROR

503 5.5.1 bad sequence of commands e190sm562849qkd.9 - gsmtp


====================================== SOLUTION =======================================
This is not a common error. No recommended solution exists!

Please report the exact error message above to https://meta.discourse.org/
(And a solution, if you find one!)
=======================================================================================

BUT: importantly, the tcpdump shows something resembling a sane(ish) flow.

22:33:48.393862 IP 64.227.96.27.54610 > 74.125.137.28.587: Flags [P.], seq 1:31, ack 59, win 502, options [nop,nop,TS val 3732187266 ecr 3508646052], length 30
	0x0000:  4500 0052 d4d6 4000 3f06 f237 40e3 601b  E..R..@.?..7@.`.
	0x0010:  4a7d 891c d552 024b 01b4 04a4 94ce dcc7  J}...R.K........
	0x0020:  8018 01f6 74dc 0000 0101 080a de74 a882  ....t........t..
	0x0030:  d121 b0a4 4548 4c4f 2063 6f6e 7665 7273  .!..EHLO.convers
	0x0040:  6174 696f 6e2e 7365 7661 7267 2e6e 6574  ation.sevarg.net
	0x0050:  0d0a                                     ..
22:33:48.408832 IP 74.125.137.28.587 > 64.227.96.27.54610: Flags [.], ack 31, win 256, options [nop,nop,TS val 3508646067 ecr 3732187266], length 0
	0x0000:  4500 0034 5e5d 0000 2b06 bccf 4a7d 891c  E..4^]..+...J}..
	0x0010:  40e3 601b 024b d552 94ce dcc7 01b4 04c2  @.`..K.R........
	0x0020:  8010 0100 a8ae 0000 0101 080a d121 b0b3  .............!..
	0x0030:  de74 a882                                .t..
22:33:48.469560 IP 74.125.137.28.587 > 64.227.96.27.54610: Flags [P.], seq 59:234, ack 31, win 256, options [nop,nop,TS val 3508646128 ecr 3732187266], length 175
	0x0000:  4500 00e3 5e8a 0000 2b06 bbf3 4a7d 891c  E...^...+...J}..
	0x0010:  40e3 601b 024b d552 94ce dcc7 01b4 04c2  @.`..K.R........
	0x0020:  8018 0100 929f 0000 0101 080a d121 b0f0  .............!..
	0x0030:  de74 a882 3235 302d 736d 7470 2d72 656c  .t..250-smtp-rel
	0x0040:  6179 2e67 6d61 696c 2e63 6f6d 2061 7420  ay.gmail.com.at.
	0x0050:  796f 7572 2073 6572 7669 6365 2c20 5b36  your.service,.[6
	0x0060:  342e 3232 372e 3936 2e32 375d 0d0a 3235  4.227.96.27]..25
	0x0070:  302d 5349 5a45 2031 3537 3238 3634 3030  0-SIZE.157286400
	0x0080:  0d0a 3235 302d 3842 4954 4d49 4d45 0d0a  ..250-8BITMIME..
	0x0090:  3235 302d 5354 4152 5454 4c53 0d0a 3235  250-STARTTLS..25
	0x00a0:  302d 454e 4841 4e43 4544 5354 4154 5553  0-ENHANCEDSTATUS
	0x00b0:  434f 4445 530d 0a32 3530 2d50 4950 454c  CODES..250-PIPEL
	0x00c0:  494e 494e 470d 0a32 3530 2d43 4855 4e4b  INING..250-CHUNK
	0x00d0:  494e 470d 0a32 3530 2053 4d54 5055 5446  ING..250.SMTPUTF
	0x00e0:  380d 0a                                  8..

So, at a minimum, sending the “EHLO localhost” or “EHLO localhost.localdomain” is part of the problem.

Now, how on earth does one go reporting a P0 issue to actual developers?

I’ve definitely seen these guys on the forums. They monitor them fairly closely from what I can tell. I’d say github but issues seem to be disabled for the repo.

Ok.

This is a test email from

https://conversation.sevarg.net

Email deliverability is complicated. Here are a few important things you should check first:

I’ve just demonstrated a fix, but I do not know how to upstream this.

cd /var/discourse
./launcher enter app
vim ./vendor/bundle/ruby/2.7.0/gems/mail-2.7.1/lib/mail/network/delivery_methods/smtp.rb

You need to find the following section:

    DEFAULTS = {
      :address              => 'localhost',
      :port                 => 25,
      :domain               => 'localhost.localdomain',
      :user_name            => nil,
      :password             => nil,
      :authentication       => nil,
      :enable_starttls      => nil,
      :enable_starttls_auto => true,
      :openssl_verify_mode  => nil,
      :ssl                  => nil,
      :tls                  => nil,
      :open_timeout         => nil,
      :read_timeout         => nil
    }

Change the domain lines.

    DEFAULTS = {
      :address              => 'conversation.sevarg.net',
      :port                 => 25,
      :domain               => 'conversation.sevarg.net',
      :user_name            => nil,
      :password             => nil,
      :authentication       => nil,
      :enable_starttls      => nil,
      :enable_starttls_auto => true,
      :openssl_verify_mode  => nil,
      :ssl                  => nil,
      :tls                  => nil,
      :open_timeout         => nil,
      :read_timeout         => nil
    }

I don’t know which one matters, but changing both solved the issue. Obviously use your domain…

Exit out of the app environment.

./launcher restart app

It should now be able to send emails.

I expect this will not survive any upgrades.

However, I am now sending and receiving emails as expected.

Devs? Plz2fix?

4 Likes

From the bug I filed, please try the following:

Add

DISCOURSE_SMTP_DOMAIN: [your install domain]

to your app.yml (/var/discourse/containers/app.yml, most likely)

Then rebuild the app (cd /var/discourse; ./launcher rebuild app) and try to send emails.

2 Likes

Just to be clear, is the DISCOURSE_SMTP_DOMAIN the domain of my discourse server or the domain of the email?

For instance, my server sits at the subdomain community.acescentral.com and my emails come from admin@acescentral.com. So is DISCOURSE_SMTP_DOMAIN the top acescentral.com or the community subdomain?

Thank you so much for being a bull dog about hunting this down