bizotto
(Denis Bizotto Trinconi)
2023 年6 月 9 日 09:54
1
大家好,
我有一个包含 3 个 Openshift 部署的安装,一个用于 Redis (7.0.10),一个用于 Postgress (13.10),另一个用于 discourse (stable 3.0.3),所有这些在部署时都能正常工作,但是,几个小时或几天后,sidekiq 进程 (UNICORN_SIDEKIQS=3) 会停止,我注意到了一些事情,在 /shared/log/rails 下,没有生成 sidekiq.log,我认为这就是 sidekiq 无法自动重启的原因:
root@discourse-b9f766dcf-52zjq:/var/www/discourse# ls -laF /shared/log/rails/
total 32
drwxr-xr-x. 2 nobody www-data 4096 Jun 9 08:57 ./
drwxr-xr-x. 3 root root 4096 May 30 06:16 ../
-rw-r--r--. 1 nobody www-data 16082 Jun 9 09:28 production.log
-rw-r--r--. 1 nobody www-data 1345 Jun 9 09:02 unicorn.stderr.log
-rw-r--r--. 1 nobody www-data 204 Jun 9 09:02 unicorn.stdout.log
当 sidekiq 停止时,我在 host/logs 中看到以下消息:
Info:
Sidekiq is consuming too much memory (using: 530.35M) for 'discourse.internal.odencluster.com', restarting
backtrace:
config/unicorn.conf.rb:163:in `check_sidekiq_heartbeat'
config/unicorn.conf.rb:243:in `master_sleep'
unicorn-6.1.0/lib/unicorn/http_server.rb:295:in `join'
unicorn-6.1.0/bin/unicorn:128:in `<top (required)>'
/var/www/discourse/vendor/bundle/ruby/3.2.0/bin/unicorn:25:in `load'
/var/www/discourse/vendor/bundle/ruby/3.2.0/bin/unicorn:25:in `<main>'
然后我在 discourse pod 日志中看到消息:
(48) Reopening logs
(48) Reopening logs
(48) Reopening logs
然而,由于 /shared/log/rails/ 下没有 sidekiq.log,它不会重启。
我对 rails 的了解几乎为零,因此很难进行故障排除,但我看到 sidekiq 没有被暂停:
[1] pry(main)> Sidekiq.paused?
=> false
当我手动启动它时,它就能工作:
2023-06-09T09:47:15.556Z pid=195386 tid=449q INFO: Booting Sidekiq 6.5.8 with Sidekiq::RedisConnection::RedisAdapter options {:host=>"redis", :port=>6379, :namespace=>"sidekiq"}
2023-06-09T09:47:20.528Z pid=195386 tid=449q INFO: Booted Rails 7.0.4.3 application in production environment
2023-06-09T09:47:20.528Z pid=195386 tid=449q INFO: Running in ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
2023-06-09T09:47:20.528Z pid=195386 tid=449q INFO: See LICENSE and the LGPL-3.0 for licensing details.
2023-06-09T09:47:20.528Z pid=195386 tid=449q INFO: Upgrade to Sidekiq Pro for more features and support: https://sidekiq.org
有几件事我想可以帮助我解决这个问题:
如何让它创建 /shared/log/rails/sidekiq.log?
如何让 sidekiq 使用超过 530M 的内存?
如果有人有建议,请告诉我,我提前感谢您抽出宝贵时间支持!
祝您有美好的一天!
1 个赞
trobiyo
(Ismael Posada Trobo)
2023 年6 月 9 日 15:52
2
您好 Denis,
我将为您提供有关如何增加 Sidekiq 的 RSS 的信息。
为此,请查看 UNICORN_SIDEKIQ_MAX_RSS 环境变量(ffi:discourse/config/unicorn.conf.rb at 89d7b1861d1625352e82e82c19f93e7272c965ef · discourse/discourse · GitHub 1000),这将允许您分配更多内存。无论如何,我建议您将 UNICORN_SIDEKIQS` 的值稍微减少到 1 或 2,除非您有大量积压的任务。
我不知道导致您的 sidekiq 重启的原因,通常它只是在 OOM 后在后台重启(根据 discourse/config/unicorn.conf.rb at 89d7b1861d1625352e82e82c19f93e7272c965ef · discourse/discourse · GitHub your-forum.com/logs 以获取更多信息,希望这对您有帮助。
祝好,
Ismael
4 个赞
bizotto
(Denis Bizotto Trinconi)
2023 年6 月 12 日 07:28
3
您好 @trobiyo ,非常感谢您快速而直接的支持!
是的,我的 sidekiq 由于 OOM(内存不足)而重启,但我现在遵循了您的建议,我已将 UNICORN_SIDEKIQS=1 减少,并使用环境变量 UNICORN_SIDEKIQ_MAX_RSS 为 RSS 分配了更多内存。
我希望这能有所帮助并避免 sidekiq 重启。
您知道为什么 sidekiq 在 /shared/log/rails/sidekiq.log 中不生成任何日志吗?
再次感谢您,祝您一切顺利!
此致,
Denis
trobiyo
(Ismael Posada Trobo)
2023 年6 月 12 日 16:34
4
2 个赞
bizotto
(Denis Bizotto Trinconi)
2023 年6 月 13 日 05:31
5
您好,
是的,DISCOURSE_LOG_SIDEKIQ=1 有帮助,我看到了 /shared/log/rails/sidekiq.log。这太棒了!
我也注意到 sidekiq 已经运行了一段时间,自从我增加了内存限制并将其减少到只有一个进程后,它就没有因为 OOM 而重启过。
这似乎是我的 sidekiq 问题解决方案,我会继续监控它,如果我仍然看到与 sidekiq 相关的问题,我会在此更新。
与此同时,我非常感谢您的帮助 @trobiyo ,您的支持非常棒!
祝您一切顺利!
2 个赞
bizotto
(Denis Bizotto Trinconi)
2023 年6 月 14 日 08:36
7
再次问候 @trobiyo ,
不幸的是,我的 sidekiq 仍然停止,看起来这些更改还不够。=/
我在日志中看到以下错误:
info:
Job exception: FinalDestination: 所有解析的 IP 都被禁止
backtrace:
/var/www/discourse/lib/final_destination/ssrf_detector.rb:104:in `lookup_and_filter_ips'
/var/www/discourse/lib/final_destination/http.rb:13:in `connect'
/usr/local/lib/ruby/3.2.0/net/http.rb:1248:in `do_start'
/usr/local/lib/ruby/3.2.0/net/http.rb:1237:in `start'
/usr/local/lib/ruby/3.2.0/net/http.rb:687:in `start'
/var/www/discourse/lib/final_destination.rb:511:in `safe_session'
/var/www/discourse/lib/final_destination.rb:450:in `safe_get'
/var/www/discourse/lib/final_destination.rb:161:in `get'
/var/www/discourse/lib/retrieve_title.rb:81:in `fetch_title'
/var/www/discourse/lib/retrieve_title.rb:7:in `crawl'
/var/www/discourse/lib/inline_oneboxer.rb:76:in `lookup'
/var/www/discourse/lib/cooked_processor_mixin.rb:310:in `process_inline_onebox'
/var/www/discourse/lib/cooked_processor_mixin.rb:39:in `block in post_process_oneboxes'
/var/www/discourse/lib/oneboxer.rb:213:in `block in apply'
/var/www/discourse/lib/oneboxer.rb:161:in `block in each_onebox_link'
nokogiri-1.14.2-x86_64-linux/lib/nokogiri/xml/node_set.rb:235:in `block in each'
nokogiri-1.14.2-x86_64-linux/lib/nokogiri/xml/node_set.rb:234:in `upto'
nokogiri-1.14.2-x86_64-linux/lib/nokogiri/xml/node_set.rb:234:in `each'
/var/www/discourse/lib/oneboxer.rb:161:in `each_onebox_link'
/var/www/discourse/lib/oneboxer.rb:212:in `apply'
/var/www/discourse/lib/cooked_processor_mixin.rb:9:in `post_process_oneboxes'
/var/www/discourse/lib/cooked_post_processor.rb:41:in `block in post_process'
/var/www/discourse/lib/distributed_mutex.rb:53:in `block in synchronize'
/var/www/discourse/lib/distributed_mutex.rb:49:in `synchronize'
/var/www/discourse/lib/distributed_mutex.rb:49:in `synchronize'
/var/www/discourse/lib/distributed_mutex.rb:34:in `synchronize'
/var/www/discourse/lib/cooked_post_processor.rb:38:in `post_process'
/var/www/discourse/app/jobs/regular/process_post.rb:28:in `block in execute'
/var/www/discourse/lib/distributed_mutex.rb:53:in `block in synchronize'
/var/www/discourse/lib/distributed_mutex.rb:49:in `synchronize'
/var/www/discourse/lib/distributed_mutex.rb:49:in `synchronize'
/var/www/discourse/lib/distributed_mutex.rb:34:in `synchronize'
/var/www/discourse/app/jobs/regular/process_post.rb:8:in `execute'
/var/www/discourse/app/jobs/base.rb:249:in `block (2 levels) in perform'
rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:80:in `with_connection'
/var/www/discourse/app/jobs/base.rb:236:in `block in perform'
/var/www/discourse/app/jobs/base.rb:232:in `each'
/var/www/discourse/app/jobs/base.rb:232:in `perform'
sidekiq-6.5.8/lib/sidekiq/processor.rb:202:in `execute_job'
sidekiq-6.5.8/lib/sidekiq/processor.rb:170:in `block (2 levels) in process'
sidekiq-6.5.8/lib/sidekiq/middleware/chain.rb:177:in `block in invoke'
/var/www/discourse/lib/sidekiq/pausable.rb:134:in `call'
sidekiq-6.5.8/lib/sidekiq/middleware/chain.rb:179:in `block in invoke'
sidekiq-6.5.8/lib/sidekiq/middleware/chain.rb:182:in `invoke'
sidekiq-6.5.8/lib/sidekiq/processor.rb:169:in `block in process'
sidekiq-6.5.8/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch'
sidekiq-6.5.8/lib/sidekiq/job_retry.rb:113:in `local'
sidekiq-6.5.8/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch'
sidekiq-6.5.8/lib/sidekiq.rb:44:in `block in <module:Sidekiq>'
sidekiq-6.5.8/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch'
sidekiq-6.5.8/lib/sidekiq/processor.rb:263:in `stats'
sidekiq-6.5.8/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch'
sidekiq-6.5.8/lib/sidekiq/job_logger.rb:13:in `call'
sidekiq-6.5.8/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch'
sidekiq-6.5.8/lib/sidekiq/job_retry.rb:80:in `global'
sidekiq-6.5.8/lib/sidekiq/processor.rb:124:in `block in dispatch'
sidekiq-6.5.8/lib/sidekiq/job_logger.rb:39:in `prepare'
sidekiq-6.5.8/lib/sidekiq/processor.rb:123:in `dispatch'
sidekiq-6.5.8/lib/sidekiq/processor.rb:168:in `process'
sidekiq-6.5.8/lib/sidekiq/processor.rb:78:in `process_one'
sidekiq-6.5.8/lib/sidekiq/processor.rb:68:in `run'
sidekiq-6.5.8/lib/sidekiq/component.rb:8:in `watchdog'
sidekiq-6.5.8/lib/sidekiq/component.rb:17:in `block in safe_thread'
根据这个错误,您能想到是什么出了问题吗?
再次感谢您的支持。
trobiyo
(Ismael Posada Trobo)
2023 年6 月 14 日 13:29
8
您好 Denis,
在我看来,这是套接字/DNS 超时问题吗?您是否在“允许的内部主机”设置下设置了任何内容?
从堆栈跟踪来看,在查找 IP 时,此列表似乎为空(请参阅 discourse/lib/final_destination/ssrf_detector.rb at main · discourse/discourse · GitHub ),在 discourse/lib/final_destination/http.rb at tests-passed · discourse/discourse · GitHub 触发,因此我倾向于认为这可能与您的安装有关(无法访问 sidekiq pod 的 IP?)
或者检查您是否在集群中使用了任何 NetworkPolicy,这可能是另一个原因。
祝好
system
(system)
关闭
2023 年7 月 14 日 13:30
9
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.