Testing SMTP and watching logs?

I’m having some SMTP server issues where Discourse seems to be the only app that’s struggling using SMTP on a particular server.

So I have two questions:

  1. How can I trigger email test sends within Discourse?
  2. Where can I find the logs to watch so that I can isolate the actual errors that are happening?

Are you following the official recommendations for email?

See /admin/email.

Make sure that your mail server will accept delivery for sending with your forum’s host name or change it at the end of the app.yml file… you can see the logs in shared/logs/production.log

And read the document linked above.

1 Like

Which doc linked above? I see no link…

This topic is discussed here, but I didn’t find a way to trigger email sending reliably (so I can be sure an email send is being 100% attempted by the system at each trigger) and thus I’m not getting anything in the logs.

Oh. He didn’t link. Search for email troubleshooting.

  ./discourse-doctor

Will send a test email with some diagnostics.

3 Likes

Alright, so that’s quite helpful. It seems that the server’s host is blocking certain ports for SMTP and so I have to be especially particular.

After changing containers/app.yml, does one need to also re-execute ./launcher rebuild app or do changes within this yaml config activate without a rebuild? Also, is there a quicker update function aside from a full rebuild of the app if this is indeed required?

Those settings are picked up whenever you build, they aren’t touched during runtime.

Are you trying to communicate with the target server on :25? The default is there because most hosts are much happier to allow secure SMTP, although some still need to be contacted to have restrictions lifted.

Again it’s one of the reasons for the email guide which is linked from the official install doc, the assumption is that if you’re going to deviate that you understand the implications and can troubleshoot these things.

2 Likes

You don’t have to rebuild after changing only smtp settings. You can

 ./launcher destroy app
 ./launcher start app
1 Like

Great @Stephen, thanks for clarifying there.

I guess the real crux of the issue, now that I’m able to change and activate email changes more readily, the problem is boiling down to this:

I too am getting a 503 AUTH command used when not advertised problem from Exim and so I need to hunt down what may be the solution. Unfortunately the link above went unanswered, so I’m going to have to do a bit of digging.

And wow, @pfaffman, that destroy command is pretty spooky!!! When I executed it and saw it’s output:

+ /usr/bin/docker stop -t 10 app
app
+ /usr/bin/docker rm app
app

… and specifically saw that rm app, I about lost my lunch… LOL.

I guess there needs to be some words of warning and some like big red flashing “DON’T PANIC, WE DIDN’T JUST WIPE OUT YOUR WORK!”

Is there something that stops you from using one of the free options in the doc? Even if you find some kind of bodge or fix in the short term what you’re doing falls outside the supported track, there’s no guarantee future upgrades won’t cause these problems to recur.

In addition to being more supportable the listed providers are experts in mail deliverability, it really does make things pretty painless.

2 Likes

@Stephen, you’re telling me that using an email SMTP service provider that is not within the recommended providers is considered outside of the supported track???

If so, that’s pretty incredible to suggest that using a standard Exim email server that works normally for email clients. Further, some of us like to have control over our mail servers vs trusting others with our mail content. I think that this is one of the attractions of Discourse - own your data - and as such, you’ll find that more and more folks will want to not use such free / bulk options as these.

At any rate, I’ll keep soldiering on here for these very reasons. cPanel SMTP servers should be quite supported.

So it’s quite interesting, I think the key here is that the following works both outside of the Docker container and within the Docker container (/var/discourse/launcher enter app):

openssl s_client -connect xxx.xxx.xxx.xxx:465 -servername mail.domain.tld -showcerts -quiet

# OUTPUT: 
depth=2 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Certification Authority
verify return:1
depth=1 C = US, ST = TX, L = Houston, O = "cPanel, Inc.", CN = "cPanel, Inc. Certification Authority"
verify return:1
depth=0 CN = domain.tld
verify return:1
220-server.domain.tld ESMTP Exim 4.91 #1 Sun, 25 Nov 2018 21:22:35 -0500
220-We do not authorize the use of this system to transport unsolicited,
220 and/or bulk e-mail.

With this I’m able to access the mail server without issue and issue commands.

However when I use these settings within containers/app.yml and then run ./discourse-doctor I’m hitting a wall with:

Net::ReadTimeout

This clearly seems to be something going on with Discourse itself and yet I’m not seeing anything meaningful in the logs about this, which I find peculiar.

Dramaticism aside, yes, the core of community support provided here falls within the bounds of the configuration laid out in those two documents. We do our best to help people get started with discourse using easily reproducible patterns. In terms of relaying SMTP that means the tried and tested providers who offer free tiers suitable to small and medium communities.

If you want to roll your own email solution on exim or anything else you accept the risks and complications it introduces.

2 Likes

Sure, I can get what you’re saying from a support standpoint, but realistically we’ve gotta understand that there are scenarios such as this that are probably either common or growing in commonality. At any rate:

So let’s get into the nitty gritty here: How do I coax email log verbosity out of Discourse so that I can see the full and complete network traffic going on when discourse-doctor triggers the test email?

Is there a log I’m missing within the container filesystem? Do I need to move the env over to dev vs production? I’m not even seeing that this is in “production” from the config folder, so I guess I need some clarity here.

I believe I can track down the problem fairly readily if I have both complete verbosity out of Discourse during the test emails AND highly verbose Exim logs (which I can do).

Right now it seems as though the SMTP server’s not even being hit by Discourse, so it almost seems that there’s a Ruby networking problem in Discourse somewhere, which is hard to believe, but I need to just make sure I do a sanity check by watching completely verbose logging for the email module.

The current facts on the ground do not support this statement. Running an email server is a support and logistical nightmare. It is becoming less common over time to run your own rando email server, not more common.

1 Like

Appreciate your personal observations @codinghorror, but let’s leave this as a moot point (I’m seeing and hearing otherwise from some folks at NOCs, but we’ll leave this topic alone as it’s difficult to fastidiously support either side). Let’s get back to the crux of my question above at getting into the debugging process for deeper exploration of what’s going on with the emailing mechanisms in Discourse.

Check the mail logs on your mail server.

Right, so digging into the mail server’s logs here (while using port 465 with TLS), I’m seeing the following Exim output:

2018-11-26 00:16:42 SMTP connection from [DISCOURSE_SERVER_IP]:52538 (TCP/IP connection count = 1)
2018-11-26 00:17:42 TLS error on connection from [DISCOURSE_SERVER_IP]:52538 (SSL_accept): error:00000000:lib(0):func(0):reason(0) 

I’m sure that Exim’s SMTPS is running on port 465. It seems as though the above error has to do with the ciphers from outdated clients for some reason. I’ll keep digging in this regard.

FYI, settings in the app.yml are:

  DISCOURSE_SMTP_ADDRESS: xxx.xxx.xxx.xxx
  DISCOURSE_SMTP_PORT: 465
  DISCOURSE_SMTP_USER_NAME: administrator@domain.tld
  DISCOURSE_SMTP_PASSWORD: "TOPs3kr3t"
  DISCOURSE_SMTP_ENABLE_START_TLS: true

Have you tried asking Exim?

Yes, I’m attacking that angle at the same time here (actually going through cPanel support first as they are usually quite helpful).

So far I’ve gone through various cipher iterations that seems as though they could have solved the issue as per Mozilla’s recommendations, but unfortunately even the oldest list will not accept connections from Discourse.

Still hunting… Will figure this out, but it would be nice to simultaneously more closely see if Discourse is spitting out any additional useful info. Can discourse-doctor be prompted to give more details about the emailing process and where it stands on the code? The seeming 1 minute timeout is a little odd and leaves much to the imagination while it’s spinning in the void.