I’ve been self-hosting Discourse for many years, and had several instances happily configured and running on a fairly beefy machine.
Today I noticed that one of my forums had gone down. The initial culprit looked to be lack of disk space, which I fixed. I then restarted the Discourse instance.
However, it’s continued to go down regularly since then. Each time I boot it, I immediately see sidekiq go crazy and a huge number of failed email jobs, which are also causing redis to store a massive amount of state, which I think was the actual cause of the disk space problem. (I’m about to do a flush next time I can bring the machine up, since if I don’t I’ll quickly be out of space on this machine and won’t be even able to start Discourse to flush it. But the flush doesn’t seem to reduce redis disk usage much.)
The error message indicates something regarding a certificate name mismatch, which I find a bit surprising since the mail server I’m using is internal and doesn’t require TLS or authentication. I was able to verify on one of my other instances (using the same email configuration) that email had stopped working. All I can see in the main production logs is a 422 error, but when I send something like a password reset I see a similar error in the sidekiq logs:
Jobs::HandledExceptionWrapper: Wrapped OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=error: certificate verify failed (Hostname mismatch)
I have been able to verify that I can send email via the command line, so this does not seem to be a problem with the email server itself, just something broken with the Discourse configuration.
Here’s the original mail configuration that was working until recently:
DISCOURSE_SMTP_ADDRESS: outbound-relays.techservices.illinois.edu
DISCOURSE_SMTP_PORT: 25
DISCOURSE_SMTP_ENABLE_START_TLS: false
Again, this mailserver is internal and doesn’t require a username or password, and these settings were working until recently. I’ve been experimenting with DISCOURSE_SMTP_OPENSSL_VERIFY_MODE
, but I can’t tell if it actually is still supported. Regardless, it doesn’t seem to help. I noticed a few new email settings that were added since I set up these forums, but they don’t seem needed given this mail server’s configuration.
Any help would be appreciated! At this point I’m honestly even having a hard time being sure of what is wrong or iterating, since rebuilding the container takes a while and the error message in the production logs only has the 422 error and I can’t figure out where to look for the actual root cause. (It must be somewhere, right? I’m sure I’m just missing it.)