Strange CPU usage since latest upgrade

schungx · April 8, 2019, 2:19pm

Ever since the latest update to v2.3.0.beta6+148, CPU usage has went up by around 30%. No idea what’s causing it.

Network traffic:

Disk activity:

Any ideas?

Quick puzzle: When did I upgrade?

schungx · April 8, 2019, 2:33pm

More detailed CPU data:

Seems to happen exactly 15min after the hour every hour, for almost exactly 20min.

EDIT: The peak CPU usage periods coincide with high disk READ activity (gigabytes of disk read).

So:

(1) Something is reading GB’s off the disk every hour

(2) Something is doing a lot of processing every hour

(3) For around 15min.

schungx · April 8, 2019, 2:49pm

When I did a glances during high CPU activity, the process shows 98% CPU with main: discourse discourse.

However, when I did a top, it shows postmaster with user lxd.

pfaffman · April 8, 2019, 2:49pm

There was an upgrade a while back that caused all images to be re-processed. That’s the likely cause.

Falco · April 8, 2019, 2:50pm

You can visit /sidekiq/scheduler/history and see which jobs cover the same area of the CPU spikes.

schungx · April 8, 2019, 2:51pm

Could be!

I guess I"ll wait for a few more days to see…

schungx · April 8, 2019, 2:53pm

The only thing that is hundreds of ms is Jobs::PeriodicalUpdates, and that’s only 400-500ms. Everything else is <100ms.

schungx · April 8, 2019, 3:04pm

Well, I get it. During the exact time of the CPU spike, only one Sidekiq task is active, Jobs::CleanUpUploads.

And the next task does not appear until exactly at end of the CPU spike.

And the duration field in the history is blank, probably overflew the number field?

schungx · April 8, 2019, 3:18pm

Can this be running on an infinite loop?

(1) Seems like Jobs::CleanUpUploads is the culprit

(2) it runs for 20min. non-stop, holding off all other Sidekiq tasks

(3) It reads 1-2GB worth of data from the disk

(4) It doesn’t write much data to the disk

(5) It does NOT incur any network traffic (all my uploads are stored in an Azure Blob storage)

(6) It keeps running for 20min. EVERY SINGLE HOUR. I don’t have that many uploads.

It almost feels like the task is reading the list of uploads from the database, decides that all of them requires processing, then tries to process each upload one by one, only to fail every time because it can’t find the file(s) on the disk. One hour later, repeat.

Falco · April 8, 2019, 3:23pm

Oh something in the plugin can trigger an odd code path here.

github.com

discourse/discourse/blob/main/app/jobs/scheduled/clean_up_uploads.rb

# frozen_string_literal: true

module Jobs
  class CleanUpUploads < ::Jobs::Scheduled
    every 1.hour

    def execute(args)
      grace_period = [SiteSetting.clean_orphan_uploads_grace_period_hours, 1].max

      # Always remove invalid upload records regardless of clean_up_uploads setting.
      Upload
        .by_users
        .where(
          "retain_hours IS NULL OR created_at < current_timestamp - interval '1 hour' * retain_hours",
        )
        .where("created_at < ?", grace_period.hour.ago)
        .where(url: "")
        .find_each(&:destroy!)

      return unless SiteSetting.clean_up_uploads?

This file has been truncated. show original

As we are not running this plugin in production, it’s not as rock solid as our S3 code.

I recommend paying attention to the list of server processes when this is happening. It should be either a PostgreSQL query or it trying to find the non-local uploads in the disk.

codinghorror · April 9, 2019, 2:14am

This is something @tgxworld should maybe have a peek at if it’s an Azure edge case?

tgxworld · April 9, 2019, 2:26am

Hmm the clean up uploads job doesn’t attempt to process any uploads since it only deletes orphaned uploads. Can you help me to run the following manually to see if it triggers the same spike?

cd /var/discourse
./launcher enter app
rails c
Jobs::CleanUpUploads.new.execute({})

schungx · April 10, 2019, 10:59am

Yes, it runs for a long time occupying CPU:

sam · April 11, 2019, 5:05am

Strangely it looks like it magically sorted itself out… @schungx should we close this?

schungx · April 11, 2019, 5:13am

OK!

Yeah, just like MAGIC!

Topic		Replies	Views
Increased CPU Usage since 3.4.0.beta4-dev ( 58f75ed205 ) upgrade Support	12	184	March 9, 2025
Higher idle process activity after upgrade Installation	7	636	May 25, 2020
Jobs::PeriodicalUpdates is the cause of high CPU (over 100%) Installation	12	1156	December 28, 2022
CPU usage increases steadily Support	15	126	April 7, 2025
Server Performance Issues Due to High CPU Usage Support	6	193	February 23, 2025

Strange CPU usage since latest upgrade

Related topics