How to troubleshoot a slow discourse?


(Allen - Watchman Monitoring) #1

The primary copy of discourse I manage is running on a 2GB droplet at digital ocean, and we’re seeing sporadic performance issues. We have about 500 users who are mostly email based, so I don’t feel it’s a load issue.

We don’t yet have the needed controls available in the admin to allow us to move to discourse hosting (which is a goal) so in the meantime, I’d love to know… what can be done to test and see if the issue is the host?


(Robin Ward) #2

When you say slow, how is it manifesting? Is posting taking a long time? How long are we talking?


(Allen - Watchman Monitoring) #3

Things like this:

random 502 errors

This is my app.yml (I’m on ubuntu 14/docker 1.6, stock discourse setup), and I see that it needs more workers (since I have 2gb ram)

app.yml · GitHub

Other slowness is a spinning loader circle, at times.


#4

You’re in “the cloud”. 502 errors are usually the result of overloaded equipment. If the configuration is sound on your individual instance, then this could be a case of “noisy neighbors” on your host server.


(Robin Ward) #5

Do the 502 errors happen right away? Or does it spin for 30 seconds and then happen?


#6

From experience on my home instance, there are numerous reasons that could be causing slowness, none of which require “noisy neighbors”.


A few questions for @watchmanmonitor (these are based off experience and issues we have encountered):

  1. About how big is your Discourse instance?
    a. How many topics?
    b. How many posts are in your largest topic? How about your largest active topic (if that’s different)?
  2. Do you have any users who frequently change their avatars?

#7

This is certainly true, however in the context of the OP it does not seem to fit. The question was how to test if the host machine is the cause of the issue. The answer would be to find the cpu run time.


(Kane York) #8

Another suggestion is to turn on postgres slow query logging.


#9

But I thought IBM fixed all those squirrel video cloud overflow exploits. Didn’t you see the commercial?


#10

shrugs Guess this is what I get for being in a room with a bunch of developers. I understand the underlying hardware and how the virtual technology interacts with it, but am no programmer.

The reason “cpu runtime” is important is because it’ll begin to tell you the true state of the host, independent of what the statistics are showing for your cpu usage. There’s a lot of misconceptions about the “cloud”. One of the interesting bits I learned was that “vcore” is pretty much meaningless when you have an oversold host. Designating different vcore values to different vms can actually increase host overhead, and the lower the amount of vcores on a vm, the higher the priority the machine can have for cpu cycles.

Not sure about the zany commercials, it’s been over a decade since I had the tube plugged in.


(Jeff Atwood) #11

Is there anything weird in the \logs folder when you visit in the web browser? Any errors, etc?


(Allen - Watchman Monitoring) #12

Thanks all, here are some answers:

They come and go, mostly in the admin side loading view of users.

In testing today, I got a spinning wheel next to the sign-in after I tried to log into our closed forum using an unapproved account. It was still spinning after I typed this sentence.

Sometimes the page just stalls, and clicking refresh is what’s needed to get page viewed.

1200 topics, probably 30 posts in the largest, and no, I don’t notice people changing their avatars. (This is very much a news distribution platform for us.)

Is there a way to do that in the container, then rebuild? If so, I don’t know it. Otherwise I can do that manually I suppose.

Edited - nothing at the moment, will check it out next time I see the slow behavior


(Jeff Atwood) #13

It is a good question. Some basic stats:

  • output of free -m memory report

  • output of similar disk space report

  • run htop and screenshot your results here?

@sam would have to provide advice beyond that. Noisy neighbor is definitely a possibility on Digital Ocean, it does happen.


(Allen - Watchman Monitoring) #14

Perhaps this is a specific reproducable step…

Went to /admin/users/list/active
pasted in an email address, and got s very long (1+ minute) spinner circle

Here’s the output you asked for

:~# free -m
             total       used       free     shared    buffers     cached
Mem:          2001       1599        402         77        335        319
-/+ buffers/cache:        944       1057
Swap:         2047          0       2047

:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda         40G   20G   19G  52% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
udev            991M   12K  991M   1% /dev
tmpfs           201M  332K  200M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none           1001M  1.5M 1000M   1% /run/shm
none            100M     0  100M   0% /run/user


:/var/discourse# ./launcher enter app
(waited a while, gave up)
^C

Video of 10 seconds of htop:
CloudApp

Screenshot of the same:

This is a 2GB Ubuntu 14 box, fully up to date, docker 1.6


(Jens Maier) #15

Does that list include userland threads or are they hidden? (See F2 Display options menu.)


(Allen - Watchman Monitoring) #16
[ ] Tree view
[ ] Shadow other users' processes
[x] Hide kernel threads
[ ] Hide userland threads
[ ] Display threads in a different color
[ ] Show custom thread names
[ ] Highlight program "basename"
[x] Highlight large numbers in memory counters
[x] Leave a margin around header
[ ] Detailed CPU time (System/IO-Wait/Hard-IRQ/Soft-IRQ/Steal/Guest)
[ ] Count CPUs from 0 instead of 1
[ ] Update process names on every refresh

so, not hidden


(Jens Maier) #17

Well, at least that explains why it lists more than one unicorn worker[0] process, altho I’d have expected to see more than three with userland threads shown…


(Allen - Watchman Monitoring) #18

For what it’s worth, this is on the host box… I had tried shortly after running the email search, and gave up before ./launcher enter app could finish… but when I tried to enter the app just now, there was no delay.

Of course, now I see there’s no htop in the docker container anyway.

I wonder though, if searching for emails is slow… in which case this thread could get split after my comment #13