"convert" process eating CPU

soke · November 5, 2018, 2:02pm

Hi.

I’ve upgraded to the latest version of discourse.
I’ve changed the CDN url and rebake the posts.

After that, I have lots of processes running one after the other with this command:

convert jpeg:/var/www/discourse/tmp/download_cache/002ca318720dd3e60e31eddddf2c12fca64df1d3.jpg[0] -auto-orient -gravity center -background transparent -thumbnail 1035x502^ -extent 1035x502 -interpolate catrom -unsharp 2x0.5+0.7+0 -interlace none -quality 98 -profile /var/www/discourse/vendor/data/RT_sRGB.icm jpeg:/tmp/discourse-thumbnail20181105-5725-s2szhw.jpeg

any thoughts?

sam · November 6, 2018, 2:42am

should stop in a few hours , we had to re-download gravatars.

skozz · February 25, 2019, 11:48am

Hi! I’ve the same problem. I searched and found this post a few days ago, so according to your feedback @sam we decided to wait before doing any action in the forum, but after 4 days the convert process is killing our machine and pulling down the forum (it starts going slower and then starts throwing 502 for any request).

This is the top at this moment:

%CPU %MEM     TIME+ COMMAND
51.5  3.0   0:07.73 ruby
27.9  7.2   3:42.61 postmaster
14.3  7.4   0:09.51 convert
13.6  7.3   1:52.68 postmaster
13.0  3.6   0:02.20 convert
11.3  3.8   0:02.41 convert
10.3  3.9   0:02.20 convert
 8.3  6.3   0:17.23 convert
 8.3  9.5   0:09.11 convert
 8.3  1.0   0:10.39 convert
 8.3  3.7   0:01.96 convert
 7.3  0.7   0:09.91 convert
 7.0  7.1   0:01.69 convert
 4.0  3.3   0:05.59 convert
 1.3  0.0   0:53.90 kswapd0
 0.7  0.0   0:00.40 kworker/u4:1ert
 1.3  0.0   0:53.90 kswapd0
 0.7  0.0   0:00.40 kworker/u4:1

Our setup is a one-click DigitalOcean image and the problem started last Friday 2019-02-22 10:00 after upgrading the forum to the last version from 2.2.0beta1 +20 to 2.3.0.beta2.

Taken actions:

Waiting for the process to end
Clearing old containers + images and rebuilding
Restarting 3-4 times the machine
Crying

Any idea?

Thank you very much and sorry for bothering you.

codinghorror · February 25, 2019, 5:26pm

Does your forum have a lot of images?

skozz · February 25, 2019, 9:57pm

Thanks for answering. “A lot” is relative, but I think yes, it have some images-based threads by the community.

I supposed that’s the reason but it looks strange to me after 4 days hitting 100% CPU and I supposed that this kind of background jobs are normally running in queues or someting similar to control it.

I just thinking loud and speculating, I don’t know (yet) how Discourse internals works.

sam · February 26, 2019, 1:05am

Maybe try halving the “rebake old posts count” to 40 and see how you are doing? Have a look at /sidekiq are you backed up with tons of jobs?

skozz · February 26, 2019, 10:15am

Thanks for answering @sam. Changing rebake to 40 helps a little bit, now throws less 502 but still failing after 20-30 minutes from the last server restart.

Regarding the jobs, yes, there are a lot of jobs and are decreasing very slowly or just stuck in the same number of pending tasks.

(it’s in spanish, if someone needs a translation just ping me and I’ll update this message with the translation)

This is the top at this moment after the last change & restarting the server 20 min ago.

%CPU %MEM     TIME+ COMMAND
43.9 16.5   0:11.38 convert
20.3  9.0   0:11.19 convert
19.9  4.1   0:12.63 convert
13.3  7.5   0:11.52 ruby
 8.0  7.6   0:11.69 ruby
 7.0  5.6   0:01.75 postmaster
 6.3  7.6   0:10.88 ruby
 2.7  3.4   0:14.12 redis-server
 2.7  5.0   0:01.11 postmaster
 1.0  3.8   0:00.80 postmaster
 0.7  4.8   0:13.04 ruby
 0.7  8.1   0:06.26 ruby
 0.3  0.0   0:01.88 rcu_sched
 0.3  0.3   0:02.34 dockerd
 0.3  0.1   0:05.23 nginx
 0.3  0.2   0:00.77 postmaster
 0.3  0.2   0:00.01 jpegoptim
 0.0  0.0   0:01.19 init
 0.0  0.0   0:00.00 kthreadd
 0.0  0.0   0:01.08 ksoftirq

zogstrip · February 26, 2019, 10:35am

Change “rebake old posts count” to 10 or lower to let sidekiq clear the queue and then slowly increase it.

bartv · February 26, 2019, 11:00am

This is odd as the convert process runs at a very low priority. I had this setting cranked way up and had our 12 core server maxed out on CPU for close to two weeks without any noticeable slowdown. Is your server short on RAM, or do you have a very slow hard disk?

skozz · February 26, 2019, 11:18am

It’s a DO droplet with 4 GB Memory / 60 GB Disk / Ubuntu Discourse on 14.04

This is the performance of the last 24h.

(those breakdowns are me restarting the server or just the site down when I was not able to restart it)

More performance screenshots with details 📈

I would like to mention we’ve never had performance issues (memory and disk) until now with the convert process.

I followed your suggestion and the performance is better (as expected, fewer things to do), but the queue still increasing instead of going to less.

Thanks everyone for your time.

zogstrip · February 26, 2019, 11:19am

What’s in the queue? Have you tried setting it to 0 to let it cool down?

skozz · February 26, 2019, 12:17pm

(translation: the value should be between 1 and 2000000000)

It’s 1 now.

sam · February 27, 2019, 12:43am

And at 1 everything is fine?

skozz · February 27, 2019, 8:51am

Sorry for the delay, I was waiting to see if the last changes make an effect, and it does.

Update:

I’ve upgraded the DO droplet to 16 GB Memory / 60 GB Disk / LON1 - Ubuntu Discourse on 14.04
Change rebake to 1
Increased workers from 5 to 8 following your suggestion in another post

10 hours later

Now, I’ve updated the rebake to 10 and let’s see. I want to rebake all as before the crisis and then back to my previous droplet.

I think now is under control so thank you so much, everyone, for your time and suggestions, it has put me in the right way, and I learned a little bit more about how to manage this cases with Discourse.

mcdanlj · March 7, 2019, 11:43am

Is it necessary to restart unicorn, sidekiq, and/or redis for the changed value to take effect? I turned rebake_old_posts_count down to 1 to try to clear sidekiq but the enqueued count is going up, not down, so it’s not clear that the setting is being honored. Or is there some other reason for over 14K Jobs::CrawlTopicLink jobs enqueued, and growing? I don’t know whether that’s the right job for that setting.

I made this change because we’re seeing something that looks superficially like this on forum.makerforums.info (hosted on DO) after importing about 37.5K topics with about 260K total posts, many of which are image-heavy with a total of about 33GB images. We had CDN configured and functional before the import; it looks like making the posts at import time didn’t use the CDN configuration and maybe is slowly re-baking to point at the CDN? The reprocessing is definitely taking much longer than the initial import, which really surprised me.

Update: It took over 12 hours to recover, but the sidekiq queue has cleared.

Topic		Replies	Views
Discourse stopping working - CPU/RAM load? Support	9	1081	April 24, 2023
High CPU usage after 2.2 update? Installation server-resources	12	1772	February 16, 2022
Image lightbox stopped appearing on beta 8 update Support	17	1073	February 16, 2019
Job exception: JavaScript was terminated (either by timeout or explicitly) Installation server-resources	8	1007	October 14, 2019
My journey into a massive posts rebake job Installation	35	4186	July 10, 2024

"convert" process eating CPU

Related topics