Sidekiq heartbeat test failed, restarting

arrowcircle · January 1, 2020, 11:20am

Hi! I found strange error in my admin page, that sidekiq is not running.
I opened logs and found hundreds errors like:

/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/logster-2.5.1/lib/logster/logger.rb:112:in `report_to_store'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/logster-2.5.1/lib/logster/logger.rb:103:in `add_with_opts'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/logster-2.5.1/lib/logster/logger.rb:54:in `add'
/usr/local/lib/ruby/2.6.0/logger.rb:534:in `warn'
config/unicorn.conf.rb:147:in `check_sidekiq_heartbeat'
config/unicorn.conf.rb:164:in `master_sleep'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.2/lib/unicorn/http_server.rb:296:in `join'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.2/bin/unicorn:128:in `<top (required)>'
/var/www/discourse/vendor/bundle/ruby/2.6.0/bin/unicorn:23:in `load'
/var/www/discourse/vendor/bundle/ruby/2.6.0/bin/unicorn:23:in `<main>'

Here is errors from the app

run-parts: executing /etc/runit/1.d/00-ensure-links
run-parts: executing /etc/runit/1.d/00-fix-var-logs
run-parts: executing /etc/runit/1.d/anacron
run-parts: executing /etc/runit/1.d/cleanup-pids
Cleaning stale PID files
run-parts: executing /etc/runit/1.d/copy-env
run-parts: executing /etc/runit/1.d/letsencrypt
[Wed 01 Jan 2020 10:40:41 AM UTC] Domains not changed.
[Wed 01 Jan 2020 10:40:41 AM UTC] Skip, Next renewal time is: Tue Feb 25 00:52:11 UTC 2020
[Wed 01 Jan 2020 10:40:41 AM UTC] Add '--force' to force to renew.
[Wed 01 Jan 2020 10:40:41 AM UTC] Installing key to:/shared/ssl/discourse.example.com.key
[Wed 01 Jan 2020 10:40:41 AM UTC] Installing full chain to:/shared/ssl/discourse.example.com.cer
[Wed 01 Jan 2020 10:40:41 AM UTC] Run reload cmd: sv reload nginx
warning: nginx: unable to open supervise/ok: file does not exist
[Wed 01 Jan 2020 10:40:41 AM UTC] Reload error for :
[Wed 01 Jan 2020 10:40:42 AM UTC] Domains not changed.
[Wed 01 Jan 2020 10:40:42 AM UTC] Skip, Next renewal time is: Wed Jan  8 00:39:10 UTC 2020
[Wed 01 Jan 2020 10:40:42 AM UTC] Add '--force' to force to renew.
[Wed 01 Jan 2020 10:40:42 AM UTC] Installing key to:/shared/ssl/discourse.example.com_ecc.key
[Wed 01 Jan 2020 10:40:42 AM UTC] Installing full chain to:/shared/ssl/discourse.example.com_ecc.cer
[Wed 01 Jan 2020 10:40:42 AM UTC] Run reload cmd: sv reload nginx
warning: nginx: unable to open supervise/ok: file does not exist
[Wed 01 Jan 2020 10:40:42 AM UTC] Reload error for :
Started runsvdir, PID is 628
chgrp: invalid group: ‘syslog’
rsyslogd: imklog: cannot open kernel log (/proc/kmsg): Operation not permitted.
rsyslogd: activation of module imklog failed [v8.1901.0 try https://www.rsyslog.com/e/2145 ]
supervisor pid: 633 unicorn pid: 647

I tried to flush redis DB, but id does not help.
I have separate containers for app, postgres and redis.
Any ideas how to fix this?

sam · January 1, 2020, 9:58pm

Is sidekiq paused, are the number of jobs queue building up?

arrowcircle · January 2, 2020, 9:24am

It shows 2 jobs in default queue, both are Jobs::VersionCheck.
After flushing redis DB it shows 0 jobs complete, 0 failed, 2 queued.

Scheduler page shows some jobs are queued or running, but this jobs does not count in the stats of sidekiq page. All jobs on scheduler page have empty or -1ms duration.

So numbers do not build up, literally nothing changes.
How to check if sidekiq is paused?

OvermindDL1 · January 6, 2020, 4:20am

Same issue here with latest update, no changes other than just updating via a rebuild, admin dashboard says sidekiq is not running. I have postgres/redis in a container and the app in another, have restarted them all again as well. Queues have a couple hundred, but nothing is processing.

EDIT1: Clearing all queues did not fix or help anything, they are refilling and still not processing.

EDIT2: And rebuilt the forum with all the downtime that entails and still have this message:

And the queues are not processing in /sidekiq. This was all working without issue before the update to 2.4.0.beta9 from beta7.

EDIT3: Over 50gigs of free space. Manually running a backup (just shy of 300 megs) works successfully, and it says it pauses and unpauses sidekiq and no error reported in the log, but sidekiq still doesn’t seem to be running?

EDIT4: Only log of note in /logs is the Sidekiq heartbeat test failed, restarting that keeps repeating.

EDIT5: Redis seems to be up and working fine, at least it’s log file is busy saying it doesn’t have much to do… And for clarity:

[3] pry(main)> Sidekiq.paused?
=> false

EDIT6: I have cleared the queue’s a bit ago, back up to 10 queued but not processing tasks.

EDIT7: Found that bundle exec sidekiq is the usual way to start sidekiq in a normal project, so let’s try running it to see what it does:

root@vps198273-forum:/var/www/discourse# bundle exec sidekiq
2020-01-06T05:10:18.814Z pid=31242 tid=gn383wxbu INFO: Booting Sidekiq 6.0.4 with redis options {:host=>"forum-data", :port=>6379, :namespace=>"sidekiq", :id=>"Sidekiq-server-PID-31242", :url=>nil}
You are connecting to Redis v3.0.6, Sidekiq requires Redis v4.0.0 or greater
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/sidekiq-6.0.4/lib/sidekiq/cli.rb:62:in `run'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/sidekiq-6.0.4/bin/sidekiq:12:in `<top (required)>'
/var/www/discourse/vendor/bundle/ruby/2.6.0/bin/sidekiq:23:in `load'
/var/www/discourse/vendor/bundle/ruby/2.6.0/bin/sidekiq:23:in `<top (required)>'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/lib/bundler/cli/exec.rb:63:in `load'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/lib/bundler/cli/exec.rb:63:in `kernel_load'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/lib/bundler/cli/exec.rb:28:in `run'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/lib/bundler/cli.rb:476:in `exec'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/lib/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/lib/bundler/cli.rb:30:in `dispatch'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/lib/bundler/vendor/thor/lib/thor/base.rb:476:in `start'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/lib/bundler/cli.rb:24:in `start'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/exe/bundle:46:in `block in <top (required)>'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/lib/bundler/friendly_errors.rb:123:in `with_friendly_errors'
/usr/local/lib/ruby/gems/2.6.0/gems/bundler-2.1.1/exe/bundle:34:in `<top (required)>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'

You are connecting to Redis v3.0.6, Sidekiq requires Redis v4.0.0 or greater

Well, this looks interesting? Let’s try rebuilding the data container and praying from the terror of touching the data, lol…

EDIT8: Well it seems to be running Redis 5.0.5 (why aren’t postgresql pubsub used for this instead?), which is definitely 4.0.0 or newer… and it’s done rebuilding, testing the forums, their data seems to still be there, and we have a spike!

Looks fixed! Maybe this post will come in useful for someone. I wish the forums gave me the error that sidekiq was giving about an out of date redis, but I’m guessing those logs are just going into the abyss somewhere as I didn’t see them anywhere. ^.^

sam · January 6, 2020, 5:27am

This is a very good piece of detective work, I hope it helps others.

codinghorror · January 6, 2020, 7:35am

How did you get an ancient redis? Are you using a non standard install?

arrowcircle · January 6, 2020, 8:08am

I use multicontainer setup and never rebuilt redis.
Will try to rebuild redis.

UPD.
@OvermindDL1, thanks for the solutions. I rebuilt redis container and now everything works.

Sidekiq is background task library and it depends on redis. It’s super optimized and mature, but supports only redis backend.
I also think that message_bus (realtime features of discourse) uses redis under the hood.

OvermindDL1 · January 12, 2020, 8:45pm

Standard docker split container install as per the instructions (so I can build a new discourse version then quick-swap across them for no downtime), but I don’t touch the data container, which is apparently where redis is running (I figured that would be in the main container, I was surprised when I couldn’t find it running in the main container).

Yep, exact same thing as @arrowcircle here.

sam · January 12, 2020, 10:34pm

In a 2 container setup you still need to rebuild the data container, I recommend you schedule it at least yearly

OvermindDL1 · January 12, 2020, 11:00pm

Any way to do this without downtime then? That was the point of separating them initially.

sam · January 12, 2020, 11:07pm

With zero downtime you would need to run replica redis and replica database. You would still have a short readonly phase as you failover.

Yes this stuff is all possible but you would need to hire an infrastructure team.

OvermindDL1 · January 12, 2020, 11:09pm

All open source stuff here, no money. ^.^;

system · February 11, 2020, 11:09pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
"Ensure sidekiq is running." when it is definitely running Installation	19	7681	October 24, 2015
Sidekiq not running. Sidekiq heartbeat test failed, restarting Installation unsupported-install	15	2733	June 10, 2020
Ensure sidekiq is running? Installation	8	2985	June 8, 2024
Sidekiq is not running, probably after upgrade Support	10	644	October 23, 2020
"Sidekiq is not running" Installation	9	2982	May 4, 2024

Sidekiq heartbeat test failed, restarting

Related topics