Discourse stopping working - CPU/RAM load?

Hi all.

Hoping that someone will be able to help me resolve an issue that we are currently having on our forum :-


This is a long story… but I want to give all info that may help resolve the issue, so please bear with me. Over the course of this you will notice that I do not have much experience with Ubuntu :frowning:

Everything has been working swimmingly for quite sometime now :smiley: right up until yesterday evening.
We are having a little rebrand, updated logos and that sort of thing, so yesterday I was in the admin panel uploading the new logos. I also noticed that we were a few versions behind so I run the updates, had to perform it manually following the instructions here:-


eg
cd /var/discourse
git pull
./launcher rebuild app

everything seemed to go fine, updates completed, logos uploaded and displaying, tried a few themes, went back to the original, then popped out to a social engagement related to the forum… yeah, I went to the pub for a meal with my mates.

about 30 mins later while out, we noticed the forum was offline… small bit of panic, then notice that Digital Ocean having issues, so relaxed.

Came back about 2 hours later, DO issues resolved, but forum still offline… no worries, I restarted the droplet, all came back on fine… 30 mins or so later it was offline again…

next I thought it would be best to clear any other updates outstanding, so tried to update docker with
wget -qO- https://get.docker.com/ | sh
this didnt seem to do much.
Rebuilt app ./launcher rebuild app
I dont believe it has updated as when rebuilding app it is saying
docker version 17.05.0-ce depreciated and when running docker version it is reporting 17.05.0-ce

then noticed that just before forum was going off line we were getting messages along lines of:-
Out of memory: kill process (convert) or sacrifice child
Out of memory: kill process (ruby) or sacrifice child

Ran Htop

lots of instances of sidekiq , found a post about reducing amount of threads rebaked at a time - reduced from 80 to 2 - issues continued

instances of convert running against JPEGs in var/www/discourse/public/uploads/default/original/ (dont know how to sdisplay rest of the string to see which images these are running against)

CPU usage 100% - Ruby var/www/discourse/vendor/buundle/ruby/2.6.0/bin/unicorn -E

Updated OS - now running Ubuntu 18.04
Docker still on 17.05.0-ce.

Droplet resized from the 2GB 1vCPU 50GB ($10) to the 3GB 1vCPU 50GB ($15)
issues continuing.

Restarting the droplet or rebuilding discourse gets it back up and running for a short period (10 - 30 mins) before it goes offline again.

Any help with this would be very much appriciated, many thanks in advance
Matt

3 Likes

A while back there were some changes in how images are compressed. You’re probably getting slammed by images being processed. It should subside soon,but if you’ve got lots of images it could be a while. You can do a

  ps -ax

Or

 top

To see what’s running. Press q to quit top.

5 Likes

Thanks @pfaffman I’ll let it run a little longer and see if it gets througj it and levels itself out a little

2 Likes

You went through most of the proper troubleshooting we would recommend, good work. Are there an awful lot of images on your forum?

I do advise you to eventually update Docker when you get a chance.

3 Likes

And when you do, remember to reboot after you do the upgrade docker. There’s no warning, and Docker didn’t work for me when I did an upgrade yesterday until I thought to reboot.

3 Likes

thanks very much guys.

Issue is still occouring tho :frowning: ruby /vazr/www/discourse/vendor/bundle/ruby/2.6.0/bin/unicorn -E using 98% CPU at the moment , Forum unavailable

number of images , not sure what you would term an aweful lot - there are quite a few, but we are not a huge forum. df showing 80% of the 50GB used, on the admin panel in discourse it shows 5.7GB upload

the dovker update command I tried didnt appear to do anything, I have found instructions on how to install on Ubuntu 18, but not update… should I be following install directions?

prior to reboot of droplet this morning there were no convert processes showing, after reboot there are now 5.utilising approx 50% of cpu and RAM, rest appears to be in use by Ruby, Sidekiq processes intermittantly using resources, procesor and RAM still getting maxed out all the time.
Convert processes still getting killed due to being out of memory

Do we think there is still a queue of work being done - would a furtehr droplet resize help get through that queue? Forum still only staying online upto 30mins at a time, often shorter.

1 Like

not managed to resolve this, increasing droplet size to see if it can complete the picture resizing if thats what it is doing.

2 Likes

After you resize be sure to run discourse-setup to adjust memory parameters and rebuild (or destroy and start).

3 Likes