RBoy
(RBoy)
1
这是我自最近升级到 2.9.0beta4 后第一次看到此错误
Jobs::UserEmail
{“type”=>“user_watching_first_post”, “user_id”=>1735, “notification_id”=>33246, “notification_data_hash”=>{“topic_title”=>“Some new notes”, “original_post_id”=>11592, “original_post_type”=>1, “original_username”=>“xfactor”, “revision_number”=>nil, “display_username”=>“xfactor”}, “notification_type”=>“watching_first_post”, “post_id”=>11592, “current_site_id”=>“default”}
Jobs::HandledExceptionWrapper: Wrapped ActiveRecord::RecordInvalid: Validation failed: Post has already been taken
如果我尝试重试,它仍然会失败。这似乎是指用户创建的新主题。
这是什么意思以及如何修复?
2 个赞
RBoy
(RBoy)
2
我看到服务器已成功为这个新主题向关注该类别的所有用户发送了大量电子邮件,我在电子邮件日志或电子邮件服务器上也没有看到任何错误。
那么这是指什么呢?
这里也有同样类型的首次错误,重试没有帮助,所有其他电子邮件都已送达:
Jobs::UserEmail
{"type"=>"user_private_message", "user_id"=>1513, "notification_id"=>871360, "notification_data_hash"=>{"topic_title"=>"Topic title", "original_post_id"=>220174, "original_post_type"=>1, "original_username"=>"username", "revision_number"=>nil, "display_username"=>"user", "group_name"=>nil}, "notification_type"=>"private_message", "post_id"=>220174, "current_site_id"=>"default"}
Jobs::HandledExceptionWrapper: Wrapped ActiveRecord::RecordInvalid: Validation failed: Post has already been taken
回溯
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/validations.rb:80:in `raise_validation_error'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/validations.rb:53:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:302:in `block in save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:354:in `block in with_transaction_returning_status'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/connection_adapters/abstract/database_statements.rb:318:in `transaction'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:350:in `with_transaction_returning_status'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:302:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/suppressor.rb:48:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/persistence.rb:55:in `create!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:799:in `_create!'
RBoy
(RBoy)
4
我也注意到了论坛日志:
Message (21 copies reported)
Job exception: Validation failed: Post has already been taken
Backtrace
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/validations.rb:80:in `raise_validation_error'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/validations.rb:53:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:302:in `block in save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:354:in `block in with_transaction_returning_status'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/connection_adapters/abstract/database_statements.rb:318:in `transaction'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:350:in `with_transaction_returning_status'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/transactions.rb:302:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/suppressor.rb:48:in `save!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/persistence.rb:55:in `create!'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:799:in `_create!'
RBoy
(RBoy)
7
对于那个条目,是的,它仍然卡在重试循环中,直到昨天,在数百次重试后,我终于受够了并删除了它。我希望我没有弄坏什么东西。
我再也没有看到它,但也没有发生过一次触发数百封电子邮件的事件。不过,了解它的来源和含义将是很好的。
mcdanlj
(Michael K Johnson)
8
几天前,我在三条消息中看到了这个错误,它们并不关键,因此在重试不成功后我删除了这些作业。
现在有 2001 条,我的测试帐户没有收到本应收到的每周摘要。
现在正在运行 8695449cfc。
2 个赞
mcdanlj
(Michael K Johnson)
9
我已升级到 bf987af3ca,并且重试了所有操作,但在我的 sidekiq 控制台中仍然显示至少 38 条 Jobs::HandledExceptionWrapper: Wrapped ActiveRecord::RecordInvalid: Validation failed: Post has already been taken。
2 个赞
mcdanlj
(Michael K Johnson)
13
没有进一步的更新,我现在剩下 30 个。我猜它们是超时的,但我的测试帐户收到了(延迟的)每周摘要,我猜这与此有关。不知道在哪里查看日志才能知道是否有任何帐户确实放弃了。
它们似乎大多失败但偶尔会成功,这肯定表明某个地方存在竞态条件。
我的堆栈跟踪看起来与 @RBoy 和 @md-misko 看到的相同,但这是 完整 的堆栈跟踪,而不仅仅是来自“复制”按钮的截断版本:
activerecord-7.0.3/lib/active_record/validations.rb:80:in `raise_validation_error'
activerecord-7.0.3/lib/active_record/validations.rb:53:in `save!'
activerecord-7.0.3/lib/active_record/transactions.rb:302:in `block in save!'
activerecord-7.0.3/lib/active_record/transactions.rb:354:in `block in with_transaction_returning_status'
activerecord-7.0.3/lib/active_record/connection_adapters/abstract/database_statements.rb:314:in `transaction'
activerecord-7.0.3/lib/active_record/transactions.rb:350:in `with_transaction_returning_status'
activerecord-7.0.3/lib/active_record/transactions.rb:302:in `save!'
activerecord-7.0.3/lib/active_record/suppressor.rb:54:in `save!'
activerecord-7.0.3/lib/active_record/persistence.rb:55:in `create!'
activerecord-7.0.3/lib/active_record/relation.rb:869:in `_create!'
activerecord-7.0.3/lib/active_record/relation.rb:115:in `block in create!'
activerecord-7.0.3/lib/active_record/relation.rb:880:in `_scoping'
activerecord-7.0.3/lib/active_record/relation.rb:428:in `scoping'
activerecord-7.0.3/lib/active_record/relation.rb:115:in `create!'
activerecord-7.0.3/lib/active_record/relation.rb:219:in `block in create_or_find_by!'
activerecord-7.0.3/lib/active_record/connection_adapters/abstract/transaction.rb:319:in `block in within_new_transaction'
activesupport-7.0.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
activesupport-7.0.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
activesupport-7.0.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
activesupport-7.0.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
activerecord-7.0.3/lib/active_record/connection_adapters/abstract/transaction.rb:317:in `within_new_transaction'
activerecord-7.0.3/lib/active_record/connection_adapters/abstract/database_statements.rb:316:in `transaction'
activerecord-7.0.3/lib/active_record/transactions.rb:209:in `transaction'
activerecord-7.0.3/lib/active_record/relation/delegation.rb:67:in `block in transaction'
activerecord-7.0.3/lib/active_record/relation.rb:880:in `_scoping'
activerecord-7.0.3/lib/active_record/relation.rb:428:in `scoping'
activerecord-7.0.3/lib/active_record/relation/delegation.rb:67:in `transaction'
activerecord-7.0.3/lib/active_record/relation.rb:219:in `create_or_find_by!'
activerecord-7.0.3/lib/active_record/querying.rb:22:in `create_or_find_by!'
/var/www/discourse/lib/email/sender.rb:498:in `get_reply_key'
/var/www/discourse/lib/email/sender.rb:105:in `send'
/var/www/discourse/app/jobs/regular/user_email.rb:83:in `send_user_email'
/var/www/discourse/app/jobs/regular/user_email.rb:38:in `execute'
/var/www/discourse/app/jobs/base.rb:232:in `block (2 levels) in perform'
/var/www/discourse/lib/rails_multisite/connection_management.rb:80:in `with_connection'
/var/www/discourse/app/jobs/base.rb:221:in `block in perform'
/var/www/discourse/app/jobs/base.rb:217:in `each'
/var/www/discourse/app/jobs/base.rb:217:in `perform'
sidekiq-6.4.2/lib/sidekiq/processor.rb:196:in `execute_job'
sidekiq-6.4.2/lib/sidekiq/processor.rb:164:in `block (2 levels) in process'
sidekiq-6.4.2/lib/sidekiq/middleware/chain.rb:138:in `block in invoke'
/var/www/discourse/lib/sidekiq/pausable.rb:138:in `call'
sidekiq-6.4.2/lib/sidekiq/middleware/chain.rb:140:in `block in invoke'
sidekiq-6.4.2/lib/sidekiq/middleware/chain.rb:143:in `invoke'
sidekiq-6.4.2/lib/sidekiq/processor.rb:163:in `block in process'
sidekiq-6.4.2/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch'
sidekiq-6.4.2/lib/sidekiq/job_retry.rb:114:in `local'
sidekiq-6.4.2/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch'
sidekiq-6.4.2/lib/sidekiq.rb:40:in `block in <module:Sidekiq>'
sidekiq-6.4.2/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch'
sidekiq-6.4.2/lib/sidekiq/processor.rb:257:in `stats'
sidekiq-6.4.2/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch'
sidekiq-6.4.2/lib/sidekiq/job_logger.rb:13:in `call'
sidekiq-6.4.2/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch'
sidekiq-6.4.2/lib/sidekiq/job_retry.rb:81:in `global'
sidekiq-6.4.2/lib/sidekiq/processor.rb:124:in `block in dispatch'
sidekiq-6.4.2/lib/sidekiq/job_logger.rb:39:in `prepare'
sidekiq-6.4.2/lib/sidekiq/processor.rb:123:in `dispatch'
sidekiq-6.4.2/lib/sidekiq/processor.rb:162:in `process'
sidekiq-6.4.2/lib/sidekiq/processor.rb:78:in `process_one'
sidekiq-6.4.2/lib/sidekiq/processor.rb:68:in `run'
sidekiq-6.4.2/lib/sidekiq/util.rb:56:in `watchdog'
sidekiq-6.4.2/lib/sidekiq/util.rb:65:in `block in safe_thread'
我还能提供哪些更多信息来帮助调试这个问题?
4 个赞
RGJ
(Richard - Communiteq)
拆分了此话题
16
mcdanlj
(Michael K Johnson)
18
我发现我的邮件服务器在使用另一个服务器的证书进行轮循(round-robin),并希望主机名不匹配是问题所在。在更换到没有证书不匹配的服务器的过程中,我更新到了 b850c12793,但这并没有解决问题。我重试了一些任务,但没有一个成功完成。因此,这个 bug 并不是隐藏的证书不匹配的症状。
这是使用 discourse_docker 2a9faf7e5680b9 构建的。
将 discourse_docker 更新到 241a42ce718,并随之将 discourse 更新到 95e7e10417,也未能解决问题。我仍然有 30 个此类失败正在重试。
1 个赞
RBoy
(RBoy)
19
根据您的描述以及查看此帖子,这里可能存在多个问题:
服务器可能没有限制其电子邮件重试次数,导致超时或被邮件服务器拒绝。但是,如果您的证书和配置有效,并且仍然无法发送电子邮件,那么还有其他潜在问题。对某些人来说,它似乎还在占用磁盘空间。我检查了我的,但没有在这里注意到这一点。
mcdanlj
(Michael K Johnson)
20
我没有用完空间,而且即使我只选择一个作业重新运行,也会发生这种情况,所以看起来不像竞态条件。这里显然不止一个问题,我在这里看到的情况与那个链接的主题无关。
(事实证明,我根本没有证书问题;服务器名称在备用服务器名称中。但我还是切换到了使用与 SN 匹配的主机名,但没有任何区别。)
我成功发送了大量的邮件,只有这几个作业卡住了。例如,我不知道应该查找哪些日志条目来帮助诊断。
2 个赞
blake
(Blake Erickson)
24
我已经为此创建了一个草稿 PR,目前它添加了一个失败的测试,重现了这个问题:
现在我们知道了导致问题的代码行,希望我们能尽快找到一个好的解决方案。
5 个赞
mcdanlj
(Michael K Johnson)
26
我检查了我那 29 封失败的邮件,以确保没有需要发送的关键内容,据我所知,确实没有。因此,我删除了 sidekiq 中的所有作业,以防这是由于邮件作业跨越升级导致的暂时性问题。然而,在未应用进一步更新的情况下,我又遇到了一个相同的失败案例。
我只是分享这些信息,表明这是一个持续存在的问题,而不是一个奇怪的暂时性问题。
2 个赞
blake
(Blake Erickson)
27
上面的代码修复已合并,您有机会时可以 git pull 最新的更改并重新构建您的容器吗?
4 个赞
mcdanlj
(Michael K Johnson)
28
升级并重试该作业成功发送了邮件;我检查了邮件日志,报告已发送。
谢谢!
4 个赞