Discourse server slower after update


(Pure Sin) #1


At my work we have a fork of the discourse-docker repo and host it on an EC2 instance. This Wednesday I pulled in changes from the original repo (starting from commit: update docker gc script · discourse/discourse_docker@4e0c969 · GitHub) and update version we’re running.

However after the update we noticed the server is much slower and user requests timeout during peak hours.

Server CPU usage went up and hits 100% during peak hours after the upgrade:

Also in mixpanel we see the response time for SSO is much longer this week compared to last week:
see url: http://i.imgur.com/i6wNBNn.png

Looking at the code changes its not clear what the cause is. Can someone point me towards a thread to start investigating?

(Matt Palmer) #2

Well, it could very easily be a coincidence. The problem with EC2 instances is that you’re sharing the CPU with other people, and AWS heavily over-contends the resources on their physical machines. If someone else on the same machine started to use the CPU more at around the time you upgraded, that could very easily result in less CPU cycles available for you, and hence the appearance of using “100% CPU”.

(Jeff Atwood) #3

I don’t think our internal perf graphs show this trend do they @sam?

(Pure Sin) #4

That’s possible but unlikely given the change occurred immediately after the update. I’ll try downgrading again and see if the performance reverts back to previous trends.

(Sam Saffron) #5

This is not related to your Docker change, we have been running the same version of Ruby for a while now.

It is possible that a Discourse update changed performance, it is far more likely you got the rough end of the stick and Amazon provisioned you on a slower server, we watch performance for all our hosted sites and have not noticed anything like this.

Downgrades of Discourse are not supported, so you can not easily revert unless you have an old database backup.

(Matt Palmer) #6

(Jeff Atwood) #7

We looked at our internal stats (warning, this is heavily affected by internal changes to our infrastructure, it is most of what you will see here) and I don’t see any signs of a recent major performance regression in the codebase…