Sudden email delivery failure

Last night around 8pm emails stopped sending from our community. There weren’t any errors on the admin page or sidekiq. When I tried sending a test email from the email settings page that worked, but any other action that generated an email didn’t send. After a reboot of the community the emails sent ok and the backlog of past failed emails ended up getting processed. I’m assuming whatever job or process that sends the emails stopped working for whatever reason. My questions are:

a. Where in the logs would I be able to find any info about why the email service stopped working?
b. Is there a way to restart that email process without having to do a full reboot of the community?

We use Mandrill for our email sends, in case that is relevant.

1 Like

Bump on those questions! Can anyone offer feedback? Thank you.

You’re going to want to look at Sidekiq: yoursite.com/sidekiq, that’s the process that handles tasks (like email).

Probably - if Sidekiq was the issue, I’d assume restarting Sidekiq would resolve that. However, I’m unsure how to restart Sidekiq.

Thanks - the only thing showing in there is 36 failures under Jobs::InviteEmail today.

1 Like

Unfortunately, this is where my email knowledge ends - someone else might be able to shed more light on how email works…

Well the email issue is still a bit of mystery, but in the meantime we also had an issue with the site giving 500 errors intermittently. I reached out to @pfaffman for consulting help and he was able to get us back stable again. If anyone is stuck with a tricky Discourse issue that they need help with I highly recommend hiring him:

6 Likes

Indeed. I’ve got no clue on this or I’d have replied earlier. There’s a chance that it was a sidekiq issue and the memory tweaks that I did will solve the problem.

2 Likes

You’re probably right that it was a memory issue, and fingers crossed that the memory config tweak you did fixes things. Thanks again for all your help!

1 Like

Yikes: This just happened to me. I’m trying to investigate now.

Discourse v1.8.0.beta2 +8

restarting the container and updating to the latest system fixed it.

This was identical to what happened for me.

Unfortunately we never got to the root cause of the emails issue. Let me know if you find out anything. It hasn’t happened again since, but I now keep a much closer eye on Mandrill to make sure emails are continuing to go out.

Sorry to resurrect an old topic, but just had this happen to our install and wanted to provide input in case it happens to anybody else.

Symptoms:

  • No email sent for 12 hours.
  • Looks like a backup was stuck or long running.
  • Sidekiq was paused, somehow.

Once I entered the container and unpaused Sidekiq, a few hundred emails (and other Sidekiq tasks) all flushed out.

Currently on v2.2.0.beta8 +113

Backups pause sidekiq, so if one got stuck, sidekiq is likely to remain paused, and emails (among other things) will stop processing. I feel like we saw this ourselves recently, anything we can do to add some protection here to detect “stuck” backups versus “big site long” backups @sam?

4 Likes

We already are meant to have a process that does this … we “keep alive” the readonly state. We need to fix it so it uses expiring redis keys for this readonly mode. (or possibly wind back some timeouts)

4 Likes