Are you running on AWS with that setup? With their machines increasing the number of cores also increases the amount of memory (which we have 7.5 and don’t use). So upgrading our instance could end up in an overpricing.
RP PS: Está nos nossos planos tentar contribuir mais para o discourse.
Looking through aws instance types they aren’t that great. Maybe we should debug better your problems before spending money.
Since you’ve come from a big import (> 2M posts) the database may be struggling. Do you hit the timings in the upper left when browsing the forums? See there so you can find if the sql performance is the problem.
When opening home (/) the query is what takes longest, it ran right now, for instance, in 300ms of a total of 485ms.
But if it were to always take about this amount, I can’t see how the request would time out because of the query.
Fun thing is, even though sometimes RDS seems to reach a peek of utilization that goes up to 80-90% of cpu usage, plenty of times we have experienced this instabilities with our database using almost nothing of its CPU and memory. I don’t know if this could be a indication that it’s not the direct cause, even though it could contribute to the problem.
After refreshing a bunch of times, I got the query to take 10710.6ms once.
I’m not sure what would make it take this long (my rds is not at 7% of it’s cpu utilization). Could my database be missing important indexes or something similar?
A GET /categories.json takes 262ms with only 24.2ms on 18 queries (9%) for me.
Your database seen to be very bad. People talk shit about RDS but I didn’t know it was that bad.
You can try to VACUUM VERBOSE ANALYZE the entire database and see if it works. If you expand the timing and see the slow queries, you can target the slow tables first.
We have seen very slow database performance cripple a site before. It’s been reported at least twice here on meta with the reproduction steps being, “have a terribly slow database”
Yeah, it’s that bad. Even more fun is that it’s a lot better than it used to be. RDS also uses tiers of EBS performance that you can’t get on a regular instance, so I have no idea how they can manage to make it perform worse than running the DB on a regular instance. My mind, she is boggled.
@Falco, I would be really glad if it turns out and the RDS is the one to blame. But we have this behavior even at 3am with no one using it…
if you browse circle-ci discouse forum at https://discuss.circleci.com/, you will also get the same 502 from nginx from time to time, even in such a small forum…
This seems like a problem that should be fairly simple to diagnose if you apply the scientific method. It definitely seems isolated to the application, so start examining the application logs (/var/discourse/shared/<name>/logs/rails/production.log) and see what they report. The unicorn logs (in the same directory) might also show something like a worker timeout. From there, make a hypothesis as to what might be going wrong, based on the data available, design an experiment, and run it. We can only guess wildly at what might be the problem, and while that’s fun (and, if you’re lucky, might give you the answer), but it’s a really poor use of everyone’s time.
my hypothesis is that there is a corner case in the latests versions of discourse that generates heavy sql queries for users with a lot of posts (migrated posts by the way). we will try to further investigate it, take a look at the query speed and maybe try newrelic. We will keeo you posted if we find real data. The production.log did not offer any insights.