Could sidekiq queue be reason for 500 errors?

bartv · June 13, 2018, 10:05am

Click on the ‘Busy’ tab at the top of the /sidekiq screen and you’ll see your queues and the jobs inside them. Each job also shows how long it’s been active, which is a great indicator of problems.

I assume your Critical queue jobs are getting handled first, but let’s confirm that these are indeed the jobs that are causing the slowness.

danmaby · June 13, 2018, 10:11am

That was one of those it can’t be that easy moments

OK, so it looks like things are processing, nothing seems to be obvious in terms of hold ups:

danmaby · June 13, 2018, 10:13am

And now the sites gone down again

# free -m
              total        used        free      shared  buff/cache   available
Mem:           7983        4829         128        2116        3025         755
Swap:          2047          72        1975

bartv · June 13, 2018, 10:31am

Looks like you don’t have any seriously slow tasks there… What’s your CPU load like during the processing? If it’s low you can try increasing UNICORN_SIDEKIQS. It’s currently set to 1 for you, adding more will add 5 job processors at a time.

In contrast, the UNICORN_WORKERS setting affects the number of concurrent web requests that can be handled - this is not related to Sidekiq and increasing the value won’t help solve this issue.

Do you see anything useful in the logs? They’re located in /var/discourse/shared/standalone/log

danmaby · June 13, 2018, 10:49am

Thanks again @bartv, I’ve added UNICORN_SIDEKIQS=5 to the app.yml and run ./launcher restart app now looking back at the sidekiq dashboard it’s still only processing around 10 per second.

Have I got UNICORN_SIDEKIQS=5 correct or should it be UNICORN_SIDEKIQS: 5

The logs are showing thousands of entries for:

Started GET "/sidekiq/stats" for 86.1.10.29 at 2018-06-13 10:34:22 +0000

bartv · June 13, 2018, 10:51am

It should be UNICORN_SIDEKIQS: 5 - the same formatting as any other setting in app.yml. You can verify this by going to the busy tab in Sidekiq again - the number of processes should match the value you entered here.

And a tip: to quickly update these settings you don’t need to do a full rebuild; just do this:

./launcher destroy app
./launcher start app

danmaby · June 13, 2018, 11:04am

OK so I updated the Unicorn Sidekiqs to 5 and this temporarily doubled the speed to around 10 per second, until the server fell over again.

# free -m
              total        used        free      shared  buff/cache   available
Mem:           7983        6086         125         971        1771         629
Swap:          2047          46        2001

I’ll try adjusting the number to see if I can get a stable increase without the server bugging out.

bartv · June 13, 2018, 11:05am

I really urge you to inspect your log files after your server crashes; they might provide actionable information.

danmaby · June 13, 2018, 11:17am

I see this error 10,000’s times in /var/discourse/shared/standalone/log/rails/production.log

As well as a very similar message, again thousands of times over, in /var/discourse/shared/standalone/log/rails/unicorn.stderr.log

bartv · June 13, 2018, 11:20am

What does the Redis log say? I had a similar issue with Redis running out of memory; the rebuild log provided the solution to this:

186:M 01 Jun 11:02:31.042 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
186:M 01 Jun 11:02:31.042 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

Perform these commands then restart your Discourse:

sysctl vm.overcommit_memory=1
echo never > /sys/kernel/mm/transparent_hugepage/enabled

Note that you’ll still need to make these persistent! (See the quoted text above to learn how)

danmaby · June 13, 2018, 11:37am

This seems to have helped! We’re five minutes in now and it’s holding steady at around 20 queued items per second with 5 unicorn Sidekiqs

I really appreciate your time on this!

system · July 13, 2018, 11:37am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sidekiq heartbeat test failed, restarting Installation	12	1934	February 11, 2020
"Ensure sidekiq is running." when it is definitely running Installation	19	7643	October 24, 2015
Sidekiq has a lot of errors and queued jobs Support	19	899	March 1, 2024
Sidekiq stops after some time Installation	8	1045	July 14, 2023
Redis connection timed out Installation	30	9380	June 8, 2024

Could sidekiq queue be reason for 500 errors?

Related topics