Could sidekiq queue be reason for 500 errors?

(Bart) #21

Click on the ‘Busy’ tab at the top of the /sidekiq screen and you’ll see your queues and the jobs inside them. Each job also shows how long it’s been active, which is a great indicator of problems.

I assume your Critical queue jobs are getting handled first, but let’s confirm that these are indeed the jobs that are causing the slowness.

(Dan Maby) #22

That was one of those it can’t be that easy moments :rofl:

OK, so it looks like things are processing, nothing seems to be obvious in terms of hold ups:

(Dan Maby) #23

And now the sites gone down again :confused:

# free -m
              total        used        free      shared  buff/cache   available
Mem:           7983        4829         128        2116        3025         755
Swap:          2047          72        1975

(Bart) #24

Looks like you don’t have any seriously slow tasks there… What’s your CPU load like during the processing? If it’s low you can try increasing UNICORN_SIDEKIQS. It’s currently set to 1 for you, adding more will add 5 job processors at a time.

In contrast, the UNICORN_WORKERS setting affects the number of concurrent web requests that can be handled - this is not related to Sidekiq and increasing the value won’t help solve this issue.

Do you see anything useful in the logs? They’re located in /var/discourse/shared/standalone/log

(Dan Maby) #25

Thanks again @bartv, I’ve added UNICORN_SIDEKIQS=5 to the app.yml and run ./launcher restart app now looking back at the sidekiq dashboard it’s still only processing around 10 per second.

Have I got UNICORN_SIDEKIQS=5 correct or should it be UNICORN_SIDEKIQS: 5

The logs are showing thousands of entries for:

Started GET "/sidekiq/stats" for at 2018-06-13 10:34:22 +0000

(Bart) #26

It should be UNICORN_SIDEKIQS: 5 - the same formatting as any other setting in app.yml. You can verify this by going to the busy tab in Sidekiq again - the number of processes should match the value you entered here.

And a tip: to quickly update these settings you don’t need to do a full rebuild; just do this:

./launcher destroy app
./launcher start app

(Dan Maby) #27

OK so I updated the Unicorn Sidekiqs to 5 and this temporarily doubled the speed to around 10 per second, until the server fell over again.

# free -m
              total        used        free      shared  buff/cache   available
Mem:           7983        6086         125         971        1771         629
Swap:          2047          46        2001

I’ll try adjusting the number to see if I can get a stable increase without the server bugging out.

(Bart) #28

I really urge you to inspect your log files after your server crashes; they might provide actionable information.

(Dan Maby) #29

I see this error 10,000’s times in /var/discourse/shared/standalone/log/rails/production.log

As well as a very similar message, again thousands of times over, in /var/discourse/shared/standalone/log/rails/unicorn.stderr.log

(Bart) #30

What does the Redis log say? I had a similar issue with Redis running out of memory; the rebuild log provided the solution to this:

186:M 01 Jun 11:02:31.042 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
186:M 01 Jun 11:02:31.042 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

Perform these commands then restart your Discourse:

sysctl vm.overcommit_memory=1
echo never > /sys/kernel/mm/transparent_hugepage/enabled

Note that you’ll still need to make these persistent! (See the quoted text above to learn how)

(Dan Maby) #31

This seems to have helped! We’re five minutes in now and it’s holding steady at around 20 queued items per second with 5 unicorn Sidekiqs

I really appreciate your time on this!