Increased CPU Usage since 3.4.0.beta4-dev ( 58f75ed205 ) upgrade

I have seen substantial increase in CPU usage since upgrading this weekend. The CPU usage of RUBY appears to be the primary driver. This was referenced by another discourse user in this topic.

As you can see from the graphs below, CPU usage and load pre-upgrade were much lower than post-upgrade. The upgrade occurred on the evening of 1/31.

Here are two images of TOP taken 33 hours apart:

In 33 hours, there is a significant ruby CPU usage. Based on the top data, I have seen 2x CPU usage in the last 33 hours over 22 days. In 33 hours, I have seen 11 hours of CPU time. (648 minutes of CPU time across 5 PIDs)

Additional Data:

  • Traffic has been down over the last two days by about 10%. (analytics and dashboard)
  • Standard single container discourse install (no-chat)
  • Sidekiq queues are minimal (1K to 2K per day)
  • Nothing seems unusual in the discourse logs
  • I run on a DO server with 8GB RAM and 2 AMD vCPUs.

This isn’t the case where the server is down critical, but servers that run at 5% to 7% are much happier than those that run at 25%.

What info can I provide to assist in troubleshooting this issue?

tia

3 Likes

Lets leave this in support for a bit till we determine if there is a bug.

Can you enter the container and run an htop from the inside (you will have to install it) that way you will be able to tell which specific process is consuming high amounts of cpu.

You can get a bit more visibility using a technique like this: Debugging 100% CPU usage in production Ruby on Rails systems

Most likely though, is sidekiq /sidekiq somehow overloaded on your instance. (I would look a schedular particularly)

htop views.

Here is a 30 second video:

Random Screen shots:

Tree View:

sidekiq:


Let me know if there is something specific you need to see. I

2 Likes

Yeah something is off:

top -H -p PID_OF_UNICORN

I suspect you will see V8 DefaultWorker there, I think this is a regressing in mini_racer… will revert it to see if it resolves this.

1 Like

OK this is reverted now,

Let me know if the commit restores performance.

6 Likes

Yes, it resolved the high CPU issue. My 1-minute and 5-minute load is about 1/3 of the previous values. That is with htop and netdata now running on the system.

HTOP video

Do Graph

I would expect the CPU usage and load to slowly decrease as more database queries are cached in the system.

Load Table:

time Pre-fix post-fix
1 min 0.4 to 0.6 0.06 to 0.1
5 min 0.39 to 0.5 0.15 to 0.18

The 15-minute metric is impacted by a rebuild. I will post some more stats later this morning.

Thank you for the late-night fix.

3 Likes

Sorry about this, the mini_racer upgrade has been a big adventure. Hopefully we get through it soon.

3 Likes

Thank you for the fast response to investigate.

I am sure you had other things planned for the day versus a rollback.

As a new Discourse user, 2 weeks since migration, the product has been great to work with.

2 Likes

Similar story here aswell.

[Edit: seems to be fixed now after updating to latest branch]

Here is a performance review 18 hours after the rebuild. The load table says it all.

Load Table:

time Pre-fix post-fix
1 min 0.4 to 0.6 0.03 to 0.05
5 min 0.39 to 0.5 0.09
15 min 0.68 0.12

Legend:

  • Red arrow - rebuilt the app
  • Purple arrow - turned off netdata

Note, closing the loop, the bug causing it was this:

I updated the gem. One immediate advantage is that it appears this version of v8 uses slightly less memory which is nice.

1 Like