Yes that’s right, it’s the same problem … it started about two weeks ago.
Can you try both
DISCOURSE_SMTP_ENABLE_START_TLS: false DISCOURSE_SMTP_OPENSSL_VERIFY_MODE: none
Are the first things I tried but still the same error
SSL_connect returned=1 errno=0 state=error: certificate verify failed (Hostname mismatch)
Hey, I tried it with both the options. It still doesn’t work:
DISCOURSE_SMTP_ADDRESS: REDACTED DISCOURSE_SMTP_PORT: 25 DISCOURSE_SMTP_USER_NAME: REDACTED DISCOURSE_SMTP_PASSWORD: REDACTED DISCOURSE_SMTP_ENABLE_START_TLS: false # (optional, default true) DISCOURSE_SMTP_OPENSSL_VERIFY_MODE: none DISCOURSE_SMTP_AUTHENTICATION: "login"
I still get
certificate verify failed (self signed certificate).
For me it has been a blocking bug for a long time …
I recommend you to create a new temporary email address that has SMTP TLS support.
Could this be related to this gem
I have the exact same problem. It started yesterday, when I upgraded (via rebuild) to 2.9.0.beta4 (a5779a7d0b). I made NO changes to app.yml, or anything else. Just a rebuild.
I now have over 1,300 failed jobs.
I’m seeing SSL errors in the logs (see below for screenshots), and I’m wondering if the rebuild is suddenly ignoring the DISCOURSE_SMTP_ENABLE_START_TLS flag?
This is what I’ve “always” had in my app.yml file: (again, no changes have been made)
DISCOURSE_SMTP_ADDRESS: 172.17.0.1 DISCOURSE_SMTP_PORT: 25 DISCOURSE_SMTP_AUTHENTICATION: none DISCOURSE_SMTP_ENABLE_START_TLS: false # (optional, default true)
EDIT: This is what I see in the email logs for the host (the email server). The error messages are new, starting after the rebuild.
The last message regarding Discourse in the email logs before the rebuild:
May 23 17:16:02 localhost postfix/smtpd: connect from discourse-docker[172.17.0.2] May 23 17:16:02 localhost postfix/smtpd: 0D803B67FB: client=discourse-docker[172.17.0.2] May 23 17:16:02 localhost postfix/cleanup: 0D803B67FB: message-id=<email@example.com> May 23 17:16:02 localhost postfix/smtpd: disconnect from discourse-docker[172.17.0.2] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
The first entry in the email logs on the server after the rebuild:
May 23 17:22:48 localhost postfix/smtpd: connect from discourse-docker[172.17.0.2] May 23 17:22:48 localhost postfix/smtpd: SSL_accept error from discourse-docker[172.17.0.2]: -1 May 23 17:22:48 localhost postfix/smtpd: warning: TLS library problem: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate:../ssl/record/rec_layer_s3.c:1528:SSL alert number 42: May 23 17:22:48 localhost postfix/smtpd: lost connection after STARTTLS from discourse-docker[172.17.0.2] May 23 17:22:48 localhost postfix/smtpd: disconnect from discourse-docker[172.17.0.2] ehlo=1 starttls=0/1 commands=1/2
After that time the entries for Discourse in the email logs all look like that.
I tried sending a message from inside the Discourse Docker container using curl. Once I made sure to specify plaintext SMTP and port 25, I can send email via the host just fine:
$ cd /var/discourse/ $ sudo ./launcher enter app x86_64 arch detected. root@discourse-app:/var/www/discourse# curl smtp://172.17.0.1 --mail-from firstname.lastname@example.org --mail-rcpt email@example.com --upload-file README.md % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 7077 0 0 100 7077 0 575k --:--:-- --:--:-- --:--:-- 575k root@discourse-app:/var/www/discourse#
And this is what that test looked like in the host’s email logs:
May 24 16:53:49 localhost postfix/smtpd: connect from discourse-docker[172.17.0.2] May 24 16:53:49 localhost postfix/smtpd: EB62CB5FCD: client=discourse-docker[172.17.0.2] May 24 16:53:49 localhost postfix/cleanup: EB62CB5FCD: message-id=<> May 24 16:53:49 localhost opendkim: EB62CB5FCD: can't determine message sender; accepting May 24 16:53:49 localhost postfix/smtpd: disconnect from discourse-docker[172.17.0.2] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
Given that I have specified no TLS and port 25 in my app.yml, and this worked until the rebuild yesterday, it’s looking more and more like the latest Discourse is ignoring my SMTP configuration in app.yml.
@gunnar I moved your post here since this is the email issue you’re describing.
I am not sure if the “post has already been taken” error is also being caused by this, but the details you gave about your email belong to this issue.
It seems absurd to me that after 30 days there is still this problem…
I had to change my email provider to get my forum working again.
That is frustrating, but it looks to me like some gem no longer supports ignoring b invalid certs and/or unencrypted transport. It may just be the case that the days of being able to send mail that way are over. But I’m not experiencing the problem myself, so I haven’t looked carefully enough to know if I’m right.
Is there a way to “downgrade” discourse to an older working version (say 2.8.0 stable or 2.9.0 beta3) until this is worked out?
I decided to spend one more half hour to dig into this and I think I found the cause.
This seems to be related to the move to Rails 7, which updated net-smtp from 0.1.0 to 0.3.1, which changed the defaults.
The way the
smtp gem calls
net-smtp does not disable
openssl_verify_mode, it only enables it when enabled.
Nice work, Richard! That would have taken me two hours if not twice that. For me it’s easier to succumb to dealing with the new defaults.
Aha. So I was sort of right, it’s just that it might not be too hard for a PR to fix it.
Nice job @RGJ!
While we anticipate a fix, on a side note, it would be good if this problem didn’t cause the cascade of issues that I experienced, which nearly brought by forum down completely. Specifically:
- The email failures seem to be retried extremely quickly, which causes the sidekiq queue to explode in size and ~100% CPU usage caused by these tasks
- In addition, something (either crashes or restarts) was causing Redis to write enormous tmp files, I assume containing the state of the sidekiq queue. While these were safe to remove, they quickly filled the disk, which cause more crashes, and so on. I had some other disk space that I was able to free so that I could restart the forum and figure out what was going on, but this may not be true for everyone. (It’s also somewhat hard to confirm that, in this case, the Redis tmp files are in fact safe to delete.)
My guess is that the simplest solution here is to slow down the retry on failed email jobs—or at least on ones that don’t have timeliness constraints like password resets. Which seems appropriate given that email problems are unlikely to resolve quickly, and most / all mailers will do their own retries once they receive a message.
In my case when I encountered the failure after the upgrade it was using TLS with a third party server and the name on the certificate matched the smtp server name. I just had one failure however. I haven’t rebooted or upgraded since to avoid further issues. I’ll try an update once the patch has been released and see how it goes.
+1 really frustrating bug
Can’t the gem be rolled back? I would be surprised if it didn’t get attention since this is a “core” functionality, the ability to send emails and for some it’s also causing an outage due to temporary files and cpu overrunning the server. The core stability of the forum is being disrupted here.