Retry error in sidekiq

This is the first time I’m seeing this error after the recent upgrade to 2.9.0beta4

Jobs::UserEmail

{“type”=>“user_watching_first_post”, “user_id”=>1735, “notification_id”=>33246, “notification_data_hash”=>{“topic_title”=>“Some new notes”, “original_post_id”=>11592, “original_post_type”=>1, “original_username”=>“xfactor”, “revision_number”=>nil, “display_username”=>“xfactor”}, “notification_type”=>“watching_first_post”, “post_id”=>11592, “current_site_id”=>“default”}

Jobs::HandledExceptionWrapper: Wrapped ActiveRecord::RecordInvalid: Validation failed: Post has already been taken

If I try to retry it it fails again. This appears to be in reference to a new topic that was created by a user.

What does this mean and how do I fix it?

2 Likes

I’m seeing that the server sent out a bunch of emails successfully for this new topic to everyone who was watching that category, I’m not seeing any errors in the email logs either or on the email server.

So what is this referring to?

Same fist time error of this type here, retrying doesn’t help, all other emails delivered:

Jobs::UserEmail

{"type"=>"user_private_message", "user_id"=>1513, "notification_id"=>871360, "notification_data_hash"=>{"topic_title"=>"Topic title", "original_post_id"=>220174, "original_post_type"=>1, "original_username"=>"username", "revision_number"=>nil, "display_username"=>"user", "group_name"=>nil}, "notification_type"=>"private_message", "post_id"=>220174, "current_site_id"=>"default"}

Jobs::HandledExceptionWrapper: Wrapped ActiveRecord::RecordInvalid: Validation failed: Post has already been taken

Backtrace

/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/validations.rb:80:in `raise_validation_error'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/validations.rb:53:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:302:in `block in save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:354:in `block in with_transaction_returning_status'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/connection_adapters/abstract/database_statements.rb:318:in `transaction'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:350:in `with_transaction_returning_status'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:302:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/suppressor.rb:48:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/persistence.rb:55:in `create!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:799:in `_create!'

I just noticed the discourse logs too:

Message (21 copies reported)

Job exception: Validation failed: Post has already been taken


Backtrace

/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/validations.rb:80:in `raise_validation_error'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/validations.rb:53:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:302:in `block in save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:354:in `block in with_transaction_returning_status'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/connection_adapters/abstract/database_statements.rb:318:in `transaction'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:350:in `with_transaction_returning_status'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:302:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/suppressor.rb:48:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/persistence.rb:55:in `create!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:799:in `_create!'

Moving this to bugs instead of support

1 Like

Hi @RBoy :slight_smile:

Are you still seeing this error?

For that entry yes it was still stuck in a retry loop until yesterday when after hundreds of retries I finally got fed up and deleted it. I hope I didn’t break something.

I haven’t seen it again but then there hasn’t been an event which triggered hundreds of emails on a single post. It would be nice though to understand where this came from and what it means.

A few days ago, I saw this error on three messages, and they weren’t critical, so I deleted the jobs after a retry was not successful.

Now I have 2001 of them, and my test account didn’t receive a weekly digest that it ought to have received.

Running 8695449cfc right now.

2 Likes

I upgraded to bf987af3ca and I retried everything and still have at least 38 of Jobs::HandledExceptionWrapper: Wrapped ActiveRecord::RecordInvalid: Validation failed: Post has already been taken showing in my sidekiq console.

2 Likes

Without further updates, I’m now now down to 30. I’d guess that they are timing out, except that my test account got the (delayed) weekly digest, and I assume this is related. Not sure where to look in logs to know whether any are actually giving up.

They seem to mostly fail but occasionally succeed, which sure smells like a race condition somewhere.

My backtraces look the same as what @RBoy and @md-misko saw, but here is the full backtrace, not just the truncated one from the “Copy” button:

activerecord-7.0.3/lib/active_record/validations.rb:80:in `raise_validation_error'
activerecord-7.0.3/lib/active_record/validations.rb:53:in `save!'
activerecord-7.0.3/lib/active_record/transactions.rb:302:in `block in save!'
activerecord-7.0.3/lib/active_record/transactions.rb:354:in `block in with_transaction_returning_status'
activerecord-7.0.3/lib/active_record/connection_adapters/abstract/database_statements.rb:314:in `transaction'
activerecord-7.0.3/lib/active_record/transactions.rb:350:in `with_transaction_returning_status'
activerecord-7.0.3/lib/active_record/transactions.rb:302:in `save!'
activerecord-7.0.3/lib/active_record/suppressor.rb:54:in `save!'
activerecord-7.0.3/lib/active_record/persistence.rb:55:in `create!'
activerecord-7.0.3/lib/active_record/relation.rb:869:in `_create!'
activerecord-7.0.3/lib/active_record/relation.rb:115:in `block in create!'
activerecord-7.0.3/lib/active_record/relation.rb:880:in `_scoping'
activerecord-7.0.3/lib/active_record/relation.rb:428:in `scoping'
activerecord-7.0.3/lib/active_record/relation.rb:115:in `create!'
activerecord-7.0.3/lib/active_record/relation.rb:219:in `block in create_or_find_by!'
activerecord-7.0.3/lib/active_record/connection_adapters/abstract/transaction.rb:319:in `block in within_new_transaction'
activesupport-7.0.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
activesupport-7.0.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
activesupport-7.0.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
activesupport-7.0.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
activerecord-7.0.3/lib/active_record/connection_adapters/abstract/transaction.rb:317:in `within_new_transaction'
activerecord-7.0.3/lib/active_record/connection_adapters/abstract/database_statements.rb:316:in `transaction'
activerecord-7.0.3/lib/active_record/transactions.rb:209:in `transaction'
activerecord-7.0.3/lib/active_record/relation/delegation.rb:67:in `block in transaction'
activerecord-7.0.3/lib/active_record/relation.rb:880:in `_scoping'
activerecord-7.0.3/lib/active_record/relation.rb:428:in `scoping'
activerecord-7.0.3/lib/active_record/relation/delegation.rb:67:in `transaction'
activerecord-7.0.3/lib/active_record/relation.rb:219:in `create_or_find_by!'
activerecord-7.0.3/lib/active_record/querying.rb:22:in `create_or_find_by!'
/var/www/discourse/lib/email/sender.rb:498:in `get_reply_key'
/var/www/discourse/lib/email/sender.rb:105:in `send'
/var/www/discourse/app/jobs/regular/user_email.rb:83:in `send_user_email'
/var/www/discourse/app/jobs/regular/user_email.rb:38:in `execute'
/var/www/discourse/app/jobs/base.rb:232:in `block (2 levels) in perform'
rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:80:in `with_connection'
/var/www/discourse/app/jobs/base.rb:221:in `block in perform'
/var/www/discourse/app/jobs/base.rb:217:in `each'
/var/www/discourse/app/jobs/base.rb:217:in `perform'
sidekiq-6.4.2/lib/sidekiq/processor.rb:196:in `execute_job'
sidekiq-6.4.2/lib/sidekiq/processor.rb:164:in `block (2 levels) in process'
sidekiq-6.4.2/lib/sidekiq/middleware/chain.rb:138:in `block in invoke'
/var/www/discourse/lib/sidekiq/pausable.rb:138:in `call'
sidekiq-6.4.2/lib/sidekiq/middleware/chain.rb:140:in `block in invoke'
sidekiq-6.4.2/lib/sidekiq/middleware/chain.rb:143:in `invoke'
sidekiq-6.4.2/lib/sidekiq/processor.rb:163:in `block in process'
sidekiq-6.4.2/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch'
sidekiq-6.4.2/lib/sidekiq/job_retry.rb:114:in `local'
sidekiq-6.4.2/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch'
sidekiq-6.4.2/lib/sidekiq.rb:40:in `block in <module:Sidekiq>'
sidekiq-6.4.2/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch'
sidekiq-6.4.2/lib/sidekiq/processor.rb:257:in `stats'
sidekiq-6.4.2/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch'
sidekiq-6.4.2/lib/sidekiq/job_logger.rb:13:in `call'
sidekiq-6.4.2/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch'
sidekiq-6.4.2/lib/sidekiq/job_retry.rb:81:in `global'
sidekiq-6.4.2/lib/sidekiq/processor.rb:124:in `block in dispatch'
sidekiq-6.4.2/lib/sidekiq/job_logger.rb:39:in `prepare'
sidekiq-6.4.2/lib/sidekiq/processor.rb:123:in `dispatch'
sidekiq-6.4.2/lib/sidekiq/processor.rb:162:in `process'
sidekiq-6.4.2/lib/sidekiq/processor.rb:78:in `process_one'
sidekiq-6.4.2/lib/sidekiq/processor.rb:68:in `run'
sidekiq-6.4.2/lib/sidekiq/util.rb:56:in `watchdog'
sidekiq-6.4.2/lib/sidekiq/util.rb:65:in `block in safe_thread'

Is there any more information I could provide to help debug this?

4 Likes

3 posts were merged into an existing topic: Email Hostname Certificate Mismatch Causing sidekiq Queue Overload, Severe Site Instability

Experiencing this issue too!

I discovered that my mail server was using a certificate for another server in its round-robin, and hoped that the hostname mismatch was the problem. I updated to b850c12793 in the process of changing to a server that did not have a certificate mismatch, but it did not resolve the problem. I retried some of the jobs, but none of them completed successfully. Therefore, this bug is not a symptom of hidden certificate mismatches.

This was built with discourse_docker 2a9faf7e5680b9.

Updating discourse_docker to 241a42ce718, and with it discourse to 95e7e10417, also did not resolve the problem. I still have 30 of these failures being retried.

1 Like

From what you’re describing and looking at this post there may be multiple issues here:

The server may not be throttling it’s retries for emails causing it to time out or be rejected by the mail server. But there’s someother underlying issue if your certificates and configuration are valid and it’s still not sending the emails. For some it also appears to be eating up disk space. I checked mine but I didn’t notice that here.

I didn’t run out of space, and this happens even when I select exactly one job to re-run so it doesn’t look like a race condition. There’s clearly more than one issue here, and what I’m seeing here is not related to that linked topic.

(It turns out that I didn’t have a certificate problem after all; the server names were in the alternate server name. But I moved to using the hostname that matches the SN anyway, and it made no difference.)

I’m successfully sending a tremendous number of mails, just these few jobs are stuck. I don’t know, for example, what log entries to go looking for to help diagnose.

2 Likes

I have created a draft pr for now that adds a failing test that replicates this issue:

Now that we know what line of code is causing the issue hopefully we can come up with a good solution soon.

5 Likes

I reviewed my 29 failed emails to make sure that there was nothing critical to send, and as far as I could tell, there wasn’t, so I deleted all the jobs in sidekiq, in case this was due to a transient problem due to email jobs spanning upgrades. However, without applying further updates, I now have another single case of the same failure.

Just sharing this as information that it’s an ongoing problem and not a weird transient.

2 Likes

The code fix above has been merged in, can you git pull the latest changes and rebuild your container when you get a chance?

4 Likes

Upgrading and retrying the job succeeded in sending the mail; I checked email logs and it was reported sent.

Thank you!

4 Likes