High CPU usage (Ruby)

Pretty frequently seeing high CPU usage and it’s usually around the 85% mark:

Was previously showing up as unicorn.conf.r:

Could this indicate UNICORN_WORKERS being set too high/low?

The server has 64GB RAM (usually shows around 40GB free) and 6 cores, there are 4 Discourse instances on the server each set to UNICORN_WORKERS: 8

Any ideas or tips on what’s causing it or what to try? (One of the forums is in read-only mode and doesn’t get much traffic, should it be set to have fewer workers?)

2 Likes

I don’t know but my bet is you are using way more workers than what your cores can offer?

1 Like

Yep. I also suggest decreasing the number of unicorn workers:

2 Likes

You could try reducing unicorn workers.

2 Likes

Thanks for the replies everyone - not sure where I read it now but I always thought we were to set 2 workers per core. I’ve dropped the workers down now per forum, allocating more to the busiest forums and less to the ones not as busy. I’ll monitor things over the next week and report back if it hasn’t help.

Edit: Think I read it here.

1 Like

In your case you aren’t allocating two workers per core though. You have six cores which would mean twelve workers, but you have four instances each using eight workers, so 32 total.

4 Likes

Yep… I’ve adjusted so the total number of workers is not greater than twice the number of cores, though I still wonder - what’s the correct/standard advice, what you said or what was in Nate’s post, where he quotes Jeff saying 1 worker per core?

From my own experiments, 1 worker per core results in timeouts (but lowers server load) more workers results in better performance but higher load (which on my server is still within an acceptable range).

1 Like

Take a look at discourse-setup, which handles the scaling for new installs today:

# UNICORN_WORKERS: 2 * GB for 2GB or less, or 2 * CPU, max 8
  if [ "$avail_gb" -le "2" ]
  then
    unicorn_workers=$(( 2 * $avail_gb ))
  else
    unicorn_workers=$(( 2 * $avail_cores ))
  fi
  unicorn_workers=$(( unicorn_workers < 8 ? unicorn_workers : 8 ))

That second statement, using double the number of available cores, is the default on systems with more than 2GB RAM. It looks as though your issue is more down to a tug-of-war between your instances (host resources), rather than a discourse problem.

2 Likes

I’m seeing the same thing after my last upgrade, which was one day after the OP, so I don’t think this has anything to do with the number of unicorn workers. The unicorn.conf.r* process is suspicious, because the original post of this topic is the only hit for that term on the entire web. I believe unicorn.conf.rb would be more normal.

The increase happened at exactly my last upgrade, 4 days ago. Note the OP posted 5 days ago. Something changed in Discourse.

I’ve used the same number of unicorn workers on the same instance for several years, and didn’t change anything- just rebuilt to 3.4.0.beta4-dev.

1 Like

FWIW, there are no long-running or failed jobs in sidekiq.

1 Like

I rebuilt with no plugins (except docker manager) and the problem persists, so it’s not a plugin’s fault.

Any clues here?

I’ve just upgraded to the latest Discourse and haven’t seen anymore unicorn.conf.r* (now anything around the 80% cpu mark is just ruby, though seems less frequent). Loads are around the same (though lower than they were after I made those worker adjustments).

Have you upgraded to the latest version? What kind of hardware are you on and how busy in your forum?

Yes, I’m at 3.4.0.beta4-dev. That’s what started the high CPU usage. Nothing else changed.

8 GB RAM, 2 vCPUs, 160 GB SSD with plenty of space.

I posted the CPU usage above for my production site, which has around 30 users online at a time. But I have a test site with the same issue and there is absolutely no traffic and no plugins there. CPU usage before and after updating (spikes are daily backups):

1 Like

I’m not sure whether our situations are related Mark. I think in my case what Stephen said played a large part:

I recently moved two other instances on to the same server and had actually forgotten that the unicorn workers were set to 8 because previously we were on a server with more cores (but it had it’s own problems hence we moved back to a Xeon which had fewer cores but performed better overall).

So what I found was reducing the unicorn workers on this server reduced load, but started giving us timeouts, increasing them eradicated timeouts but resulted in a higher load - though still within an acceptable range. I think I could increase workers and we could still handle the increased load, but what we have now is good for now.

Having said that, I had moved the instances on to the same server and it was running within what I would have expected (so load increased but not by a huge amount) and it did feel that an update resulted in higher loads… however I cannot be sure of that, and we have to keep in mind that from time to time with Discourse getting more features it may require more powerful hardware or result in sometimes feeling ‘slower’ (I had some Discourse instances on old versions and they felt noticeably snappier - though of course they didn’t have all the features of the newer versions).

Having said that as well, I think loads have actually decreased a little since the latest Discourse update (with PG 15).

I’m not sure what to suggest for you Mark - maybe play around with workers and some of the other settings too? Such as db_shared_buffers and db_work_mem? Perhaps start a dedicated thread along the lines of “High CPU usage after update - does my instance need perf tweaks?” Or something like that :slight_smile:

1 Like

I upgraded tonight and immediately saw a difference in CPU usage on my site. Here is a graph of before, during, and after the upgrade. This represents a one-hour duration.

Standard Discourse single container install running on a DO - 8 GB RAM, 2 vCPUs, and 100 GB SSD with plenty of space.

We will see what it looks like after 12 hours.

4 Likes

Here are the results after 15 hours since the upgrade. CPU usage percentage is drastically increased by 3X. Load Factor has increased by 4x.

Min Avg. Pre-Upgrade Post-Upgrade
5 .11 .4
15 .10 .45

24 hour view:


Java is the main CPU use. Something has drastically changed in the latest upgrade.

What info does the Discourse team need to troubleshoot?
Should this topic be moved over to a Bug?

2 Likes

So it looks like my issue wasn’t the unicorn workers after all - after @sam’s update following @LotusJeff’s thread the server loads have gone back to what they were (less than half of what they had gone up to)…

4 Likes

This fixed my problem too.

1 Like

I probably wouldn’t have noticed if I hadn’t been keeping an eye on the server after having recently moved the other two forums on it - I wonder how many people it affected without them even realising?

Does the Discourse team have measures put in place to alert them of issues like this? Perhaps a volunteer program that admins can set up for specific topics, eg, “Send server loads to Discourse within XX hours/days/weeks before/after an upgrade” Or better still track these locally and then alert admins when server load increases are noticed after upgrades - which we can then post here if need be…

1 Like

I probably would not have noticed the impact, but I am monitoring the server closely because we migrated to Discourse about 2 weeks ago. I am in the weeds doing various post-migration validations (backup run, etc.). After a couple of months, I would never have noticed the impact.

I would hope that discourse has a daily load test running. In my past life, I had a server that would rebuild daily with committed code. It had simulated users using the server all day. We measured key performance metrics from a user perspective and a server perspective. It allowed us to proactively catch memory leaks, inefficient code, and unexpected changes to UX.

I still have to give Kudos to Sam and the team. Coming from the land of phpBB, where something like this would take decades to solve and remedy, I found the fast response terrific. (Even if it meant staying up to 2am CT time compared to Sydney time.)

2 Likes