Temporary name resolution problem prevents emails from sending and they aren't queued?

I posted a topic update and shortly after, there was a temporary problem with the network connection for our Discourse instance.

This morning we had reports that users didn’t get their email notifications. After checking their settings and the email logs in admin, I saw the following in /logs:

Recent
Message (25 copies reported)
Job exception: getaddrinfo: Temporary failure in name resolution
Backtrace

/usr/local/lib/ruby/2.4.0/net/smtp.rb:539:in `initialize'
/usr/local/lib/ruby/2.4.0/net/smtp.rb:539:in `open'
/usr/local/lib/ruby/2.4.0/net/smtp.rb:539:in `tcp_socket'
/usr/local/lib/ruby/2.4.0/net/smtp.rb:549:in `block in do_start'
/usr/local/lib/ruby/2.4.0/timeout.rb:93:in `block in timeout'
/usr/local/lib/ruby/2.4.0/timeout.rb:103:in `timeout'
/usr/local/lib/ruby/2.4.0/net/smtp.rb:548:in `do_start'
/usr/local/lib/ruby/2.4.0/net/smtp.rb:518:in `start'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/mail-2.6.6/lib/mail/network/delivery_methods/smtp.rb:111:in `deliver!'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/mail-2.6.6/lib/mail/message.rb:2149:in `do_delivery'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/mail-2.6.6/lib/mail/message.rb:237:in `block in deliver'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/actionmailer-4.2.8/lib/action_mailer/base.rb:543:in `block in deliver_mail'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/activesupport-4.2.8/lib/active_support/notifications.rb:164:in `block in instrument'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/activesupport-4.2.8/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/activesupport-4.2.8/lib/active_support/notifications.rb:164:in `instrument'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/actionmailer-4.2.8/lib/action_mailer/base.rb:541:in `deliver_mail'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/mail-2.6.6/lib/mail/message.rb:237:in `deliver'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/actionmailer-4.2.8/lib/action_mailer/message_delivery.rb:85:in `deliver_now'
/var/www/discourse/lib/email/sender.rb:184:in `send'
/var/www/discourse/app/jobs/regular/notify_mailing_list_subscribers.rb:55:in `block (2 levels) in execute'
/var/www/discourse/app/models/email_log.rb:37:in `block in unique_email_per_post'
/var/www/discourse/lib/distributed_mutex.rb:21:in `synchronize'
/var/www/discourse/lib/distributed_mutex.rb:5:in `synchronize'
/var/www/discourse/app/models/email_log.rb:33:in `unique_email_per_post'
/var/www/discourse/app/jobs/regular/notify_mailing_list_subscribers.rb:54:in `block in execute'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/activerecord-4.2.8/lib/active_record/relation/batches.rb:51:in `block (2 levels) in find_each'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/activerecord-4.2.8/lib/active_record/relation/batches.rb:51:in `each'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/activerecord-4.2.8/lib/active_record/relation/batches.rb:51:in `block in find_each'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/activerecord-4.2.8/lib/active_record/relation/batches.rb:124:in `find_in_batches'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/activerecord-4.2.8/lib/active_record/relation/batches.rb:50:in `find_each'
/var/www/discourse/app/jobs/regular/notify_mailing_list_subscribers.rb:35:in `execute'
/var/www/discourse/app/jobs/base.rb:154:in `block (2 levels) in perform'
Env

hostname	proddiscourse-app
process_id	112
application_version	c5401a0927c75d89a1f224b6616a514566aa5f74
current_db	default
current_hostname	discourse.clcohio.org
job	Jobs::NotifyMailingListSubscribers
message	Sending post to mailing list subscribers
user_id	[13, 17, 21, 24, 26, 30, 31, 34, 37, 42, 43, 56, 65, 79, 90, 102, 112, 114, 115, 124, 125, 127, 129, 138, 141]
user_email	[nnn@nnn.org]
opts	
post_id	10927
current_site_id	default
Solve Protect Share

So if there is a network error when email notices are supposed to go out, those notifications aren’t queued for sending until the network connection returns?

2 Likes

Should I consider moving this to the bug or feature request category @team?

We should handle these errors and let the jobs be re-queued rather than being dropped.

3 Likes

@nbianca If there is an easy way to handle these temporary failures, we should not drop the email but instead have the job being re-queued.

3 Likes

We’ll now silently retry sending emails on temporary issues.

https://github.com/discourse/discourse/pull/6375

4 Likes