Sudden email delivery failure

(Nicholas Tolstoshev) #1

Last night around 8pm emails stopped sending from our community. There weren’t any errors on the admin page or sidekiq. When I tried sending a test email from the email settings page that worked, but any other action that generated an email didn’t send. After a reboot of the community the emails sent ok and the backlog of past failed emails ended up getting processed. I’m assuming whatever job or process that sends the emails stopped working for whatever reason. My questions are:

a. Where in the logs would I be able to find any info about why the email service stopped working?
b. Is there a way to restart that email process without having to do a full reboot of the community?

We use Mandrill for our email sends, in case that is relevant.

(Nick Morin) #2

Bump on those questions! Can anyone offer feedback? Thank you.

(Joshua Rosenfeld) #3

You’re going to want to look at Sidekiq:, that’s the process that handles tasks (like email).

Probably - if Sidekiq was the issue, I’d assume restarting Sidekiq would resolve that. However, I’m unsure how to restart Sidekiq.

(Nicholas Tolstoshev) #4

Thanks - the only thing showing in there is 36 failures under Jobs::InviteEmail today.

(Joshua Rosenfeld) #5

Unfortunately, this is where my email knowledge ends - someone else might be able to shed more light on how email works…

(Nicholas Tolstoshev) #6

Well the email issue is still a bit of mystery, but in the meantime we also had an issue with the site giving 500 errors intermittently. I reached out to @pfaffman for consulting help and he was able to get us back stable again. If anyone is stuck with a tricky Discourse issue that they need help with I highly recommend hiring him:

(Jay Pfaffman) #7

Indeed. I’ve got no clue on this or I’d have replied earlier. There’s a chance that it was a sidekiq issue and the memory tweaks that I did will solve the problem.

(Nicholas Tolstoshev) #8

You’re probably right that it was a memory issue, and fingers crossed that the memory config tweak you did fixes things. Thanks again for all your help!

(Steve) #9

Yikes: This just happened to me. I’m trying to investigate now.

Discourse v1.8.0.beta2 +8

(Steve) #10

restarting the container and updating to the latest system fixed it.

This was identical to what happened for me.

(Nicholas Tolstoshev) #11

Unfortunately we never got to the root cause of the emails issue. Let me know if you find out anything. It hasn’t happened again since, but I now keep a much closer eye on Mandrill to make sure emails are continuing to go out.

(Brandon Martus) #12

Sorry to resurrect an old topic, but just had this happen to our install and wanted to provide input in case it happens to anybody else.


  • No email sent for 12 hours.
  • Looks like a backup was stuck or long running.
  • Sidekiq was paused, somehow.

Once I entered the container and unpaused Sidekiq, a few hundred emails (and other Sidekiq tasks) all flushed out.

Currently on v2.2.0.beta8 +113

(Joshua Rosenfeld) #13

Backups pause sidekiq, so if one got stuck, sidekiq is likely to remain paused, and emails (among other things) will stop processing. I feel like we saw this ourselves recently, anything we can do to add some protection here to detect “stuck” backups versus “big site long” backups @sam?

(Sam Saffron) #14

We already are meant to have a process that does this … we “keep alive” the readonly state. We need to fix it so it uses expiring redis keys for this readonly mode. (or possibly wind back some timeouts)