Higher idle process activity after upgrade

tormod · May 23, 2020, 3:09pm

Hi,
After upgrading Discourse to v2.5.0.beta4-399-gbf8085e436 (from 2.4 something, using command line bootstrap from discourse_docker git, just before the Postgres 12 upgrade commits), there is notably more CPU churn on the server (even if no one is connected).

“top” tells that several “ruby” processes are active quite often. I have understood these are the unicorn sidekiq workers. Running redis-cli monitor I see that they make blocking BRPOP requests with a timeout of 2 s, to see if there is work to be done. So this amounts to a dozen of redis commands per second. Also there are statistics being updated every 5 seconds. redis-cli info stats tells there are in average 15-20 commands processed per second.

I don’t know if this activity can build up to the 2-4% CPU utilization that we see. With the old version it was less than 0.5%. I didn’t look at the same redis-cli stats back then.

I would be curious to try out a longer timeout to see if that helps, but I haven’t figured out how to do it.

codinghorror · May 23, 2020, 6:18pm

Hmm would this be related to your work @eviltrout?

eviltrout · May 25, 2020, 2:08pm

Which work were you thinking of? I don’t think anything I worked on recently would be likely to cause this.

codinghorror · May 25, 2020, 5:12pm

My bad I was thinking of ember-cli

riking · May 25, 2020, 5:14pm

Check forum.example.com/sidekiq - there might be some background reprocessing jobs going on after the upgrade.

eviltrout · May 25, 2020, 5:20pm

Highly unlikely it touched redis. I’ll never say never because I’ve been burned before but I suspect this is something else

tormod · May 25, 2020, 6:09pm

Cool hint. I didn’t see anything alarming there, no jobs running and none with long execution (only a few over 100ms). Dashboard says ~1000 processed per day, same as before the upgrade.

Probably off-topic, but interesting also the Processed graph over 6 months, seems like there are two plateaus (one twice as high as the other) that it toggles between with weeks between. It does not correlate with reboots (and we didn’t update this stuff in that time either).

Is there an easy way to change that 2s BRPOP timeout? I didn’t find it in the code. It would at least indicate whether the work polling loops are the problem.

tormod · May 25, 2020, 6:30pm

Such are the things I see in top sometimes:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                     
   1024 discour+  20   0  333584 202888  21660 S   1.3   2.5   3:35.81 ruby                                                                        
   1156 discour+  20   0  715904 249416  24532 S   1.3   3.1   3:26.50 ruby                                                                        
   1178 discour+  20   0  726664 251468  24564 S   1.3   3.1   3:26.11 ruby                                                                        
   1189 discour+  20   0  714368 247912  24444 S   1.3   3.1   3:24.56 ruby                                                                        
   1200 discour+  20   0  709760 249708  24632 S   1.3   3.1   3:22.96 ruby                                                                        
   1234 discour+  20   0  713344 250288  24636 S   1.3   3.1   3:30.24 ruby                                                                        
   1167 discour+  20   0  712832 247928  24436 S   1.0   3.1   3:24.75 ruby                                                                        
 188658 me        20   0   10424   4240   3576 R   0.7   0.1   0:00.36 top                                                                         
    448 root      20   0 1748444  84900  45884 S   0.3   1.1   6:09.98 dockerd

with no one connected…

The 3.5 minutes accumulated CPU time each is after 35 hours up, with very little user activity.

I also tried to use rbtrace, but it complains:
(pid is not listening for messages, did you require “rbtrace”)
Do I need to rebuild the image to fix this? Or can I just adjust or rebuild something inside the container?

Topic		Replies	Views
Increased CPU Usage since 3.4.0.beta4-dev ( 58f75ed205 ) upgrade Support	12	191	March 9, 2025
Memory creep in last couple of updates Hosting	27	2678	May 21, 2019
High CPU usage (Ruby) Support server-resources	21	336	March 5, 2025
Strange CPU usage since latest upgrade Installation server-resources	15	1007	April 11, 2019
Could sidekiq queue be reason for 500 errors? Installation server-resources	31	3809	July 13, 2018

Higher idle process activity after upgrade

Related topics