I have two jobs from Sidekiq that seem to be taking a long time to complete. It looks like the same overall process but there are two jobs listing it in the RUNNING status. These jobs have been running for 6 hours now, and postgres is hanging on the CLI when I attempt to even run an EXPLAIN ANALYZE against the first of these queries.
Any ideas on what would cause these queries to take so long to execute?
This subquery returns around 13,000 rows on our instance:
SELECT ids.user_id, q.post_id, p3.created_at granted_at
FROM
(
SELECT p1.user_id, MIN(q1.id) id
FROM quoted_posts q1
JOIN badge_posts p1 ON p1.id = q1.post_id
JOIN badge_posts p2 ON p2.id = q1.quoted_post_id
WHERE (TRUE OR ( p1.id IN (-1) ))
GROUP BY p1.user_id
) ids
JOIN quoted_posts q ON q.id = ids.id
JOIN badge_posts p3 ON q.post_id = p3.id
It’s then being LEFT JOINED to the user_badges table, which has 84,000 rows. It seems like something in the last condition WHERE ub.badge_id = 15 AND q.user_id IS NULL is blowing up this query. If I leave off the WHERE clause then it will execute in a reasonable amount of time (20 seconds or so), but if I include even just the WHERE ub.badge_id = 15, then I cannot even get an EXPLAIN to execute against this query in a reasonable amount of time. The EXPLAIN has been hanging for several minutes now with no results. The actual execution of the full query has been running for hours. Is there anything we can do to optimize this query?
There’s the following code that runs this query that is currently stuck. If I change the first join from LEFT JOIN to INNER JOIN the query executes instantly. Is there a reason this needs to be a left join?
sql = <<~SQL
DELETE FROM user_badges
WHERE id IN (
SELECT ub.id
FROM user_badges ub
LEFT JOIN (
#{badge.query}
) q ON q.user_id = ub.user_id
#{post_clause}
WHERE ub.badge_id = :id AND q.user_id IS NULL
)
SQL
Another query that looks bad is this one, I think from the weekly cleanup job:
UPDATE posts
SET percent_rank = X.percent_rank
FROM (
SELECT posts.id, Y.percent_rank
FROM posts
JOIN (
SELECT id, percent_rank()
OVER (PARTITION BY topic_id ORDER BY SCORE DESC) as percent_rank
FROM posts
) Y ON Y.id = posts.id
JOIN topics ON posts.topic_id = topics.id
WHERE (posts.percent_rank IS NULL OR Y.percent_rank <> posts.percent_rank)
LIMIT 20000
) AS X
WHERE posts.id = X.id
The explain for this shows it’s trying to sort all 26 million rows in the posts table. I can’t tell what method it will use for this query, but based on the fact that the active wait is “DataFileRead” I think it is going out to disk for something…
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Update on posts (cost=8312704.61..8627308.35 rows=20000 width=825)
-> Nested Loop (cost=8312704.61..8627308.35 rows=20000 width=825)
-> Subquery Scan on x (cost=8312704.18..8464468.35 rows=20000 width=48)
-> Limit (cost=8312704.18..8464268.35 rows=20000 width=12)
-> Hash Join (cost=8312704.18..209445240.14 rows=26540908 width=12)
Hash Cond: (posts_1.topic_id = topics.id)
-> Nested Loop (cost=8277347.60..209340213.36 rows=26540908 width=16)
-> WindowAgg (cost=8277347.16..8809352.84 rows=26600284 width=24)
-> Sort (cost=8277347.16..8343847.87 rows=26600284 width=16)
Sort Key: posts_2.topic_id, posts_2.score DESC
-> Seq Scan on posts posts_2 (cost=0.00..4542277.84 rows=26600284 width=16)
-> Index Scan using posts_pkey on posts posts_1 (cost=0.44..7.52 rows=1 width=16)
Index Cond: (id = posts_2.id)
Filter: ((percent_rank IS NULL) OR ((percent_rank() OVER (?)) <> percent_rank))
-> Hash (cost=23871.05..23871.05 rows=918842 width=4)
-> Index Only Scan using topics_pkey on topics (cost=0.42..23871.05 rows=918842 width=4)
-> Index Scan using posts_pkey on posts (cost=0.44..8.14 rows=1 width=781)
Index Cond: (id = x.id)
JIT:
Functions: 24
Options: Inlining true, Optimization true, Expressions true, Deforming true
(21 rows)
I have a strong feeling that you are swimming upstream here and the DB you are using simply does not have enough resources to run Discourse.
What are the exact specs of the metal (CPU / Hard drive make and model) ?
What is the exact spec of the VM ?
These queries are indeed expensive, but we host plenty large forums (eg: About - Straight Dope Message Board 22 million posts) and we are able to run all these queries just fine on that instance.
The VM running Discourse has been assigned 8 CPU cores and 32 GB RAM.
I believe I have discovered the issue with the first query, or at least a way to instruct the query planner to make the right decision. Here’s the query that would not complete in 16+ hours (this is for the First Quote badge):
SELECT ub.id
FROM user_badges ub
LEFT JOIN (
SELECT ids.user_id, q.post_id, p3.created_at granted_at
FROM
(
SELECT p1.user_id, MIN(q1.id) id
FROM quoted_posts q1
JOIN badge_posts p1 ON p1.id = q1.post_id
JOIN badge_posts p2 ON p2.id = q1.quoted_post_id
WHERE (TRUE OR ( p1.id IN (-1) ))
GROUP BY p1.user_id
) ids
JOIN quoted_posts q ON q.id = ids.id
JOIN badge_posts p3 ON q.post_id = p3.id
) q ON q.user_id = ub.user_id
AND (q.post_id = ub.post_id OR NOT TRUE)
WHERE ub.badge_id = 15 AND q.user_id IS NULL
If I add a single ORDER BY line in the right location, this query now completes in a few seconds:
SELECT ub.id
FROM user_badges ub
LEFT JOIN (
SELECT ids.user_id, q.post_id, p3.created_at granted_at
FROM
(
SELECT p1.user_id, MIN(q1.id) id
FROM quoted_posts q1
JOIN badge_posts p1 ON p1.id = q1.post_id
JOIN badge_posts p2 ON p2.id = q1.quoted_post_id
WHERE (TRUE OR ( p1.id IN (-1) ))
GROUP BY p1.user_id
) ids
JOIN quoted_posts q ON q.id = ids.id
JOIN badge_posts p3 ON q.post_id = p3.id
ORDER BY ids.user_id
) q ON q.user_id = ub.user_id
AND (q.post_id = ub.post_id OR NOT TRUE)
WHERE ub.badge_id = 15 AND q.user_id IS NULL
I feel like it should be smart enough to do this sort in the right place, but it seems it’s not… Still, the fix seems rather easy at this point.
Haven’t really started digging into the other query on the percent_rank yet.
Some times planning is off when statistics are bad … in some exceptional cases a full vacuum can help, a minimal vacuum is totally recommended post imports. I think you did both.
Is there a reason you are running in a VM vs Docker directly on the host?
Yes, I’ve run VACUUM ANALYZE a few times. The statistics should be correct, but this seems to be choosing poorly across multiple rebuilds, Postgres tuning adjustments, and vacuums.
We’re running other VMs on this host machine but we have the spare resources at the moment so this is where I’ve built a system to test Discourse.
Our DB server is ballpark similar hardware performance to yours (though we have faster IO given we have a larger RAID array.) However we do not run virtualized at all. That is one big difference.
I have not. I can give that a try and see if the behavior is any different.
I’m sure there’s some performance loss from running in a VM, but there’s nothing clobbering the hardware heavily. When I ran the import to bring all our data in from our other software, I was able to get 60-70% usage out of all 8 cores running multiple import processes simultaneously.
When these jobs are sitting and spinning now I usually don’t see the load average above 2-3 so they are not even using all the CPU that’s available.
On the percent_rank query that looks to be part of the weekly job, what does the EXPLAIN look like for your large instance? My instance is reporting a query cost of over 8 million on that one, which seems a bit scary.
Do you have any recommendations for the postgres tuning values in app.yml? Right now I’m using:
shared_buffers: 16GB
work_mem: 512MB
The VACUUM FULL completed but it doesn’t seem to have made a difference in the query performance. The badge query still seems like it will run for hours and hours unless I add the ORDER BY clause, and the percent_rank query has been running for two hours now without completing. We will probably need to change the SQL definition for the First Quote badge and then I’ll need to look at what can be done to fix the percent_rank query.
Do you have any suggestions for the percent_rank query based on this EXPLAIN?
UPDATE posts
SET percent_rank = X.percent_rank
FROM (
SELECT posts.id, Y.percent_rank
FROM posts
JOIN (
SELECT id, percent_rank()
OVER (PARTITION BY topic_id ORDER BY SCORE DESC) as percent_rank
FROM posts
) Y ON Y.id = posts.id
JOIN topics ON posts.topic_id = topics.id
WHERE (posts.percent_rank IS NULL OR Y.percent_rank <> posts.percent_rank)
LIMIT 20000
) AS X
WHERE posts.id = X.id
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Update on posts (cost=6511439.82..6944253.09 rows=20000 width=828)
-> Nested Loop (cost=6511439.82..6944253.09 rows=20000 width=828)
-> Subquery Scan on x (cost=6511439.38..6784765.09 rows=20000 width=48)
-> Limit (cost=6511439.38..6784565.09 rows=20000 width=12)
-> Nested Loop (cost=6511439.38..374544016.70 rows=26949684 width=12)
-> Nested Loop (cost=6511438.96..192122439.64 rows=26949684 width=16)
-> WindowAgg (cost=6511438.52..7050906.24 rows=26973386 width=24)
-> Sort (cost=6511438.52..6578871.98 rows=26973386 width=16)
Sort Key: posts_2.topic_id, posts_2.score DESC
-> Seq Scan on posts posts_2 (cost=0.00..2721272.86 rows=26973386 width=16)
-> Index Scan using posts_pkey on posts posts_1 (cost=0.44..6.84 rows=1 width=16)
Index Cond: (id = posts_2.id)
Filter: ((percent_rank IS NULL) OR ((percent_rank() OVER (?)) <> percent_rank))
-> Index Only Scan using topics_pkey on topics (cost=0.42..6.77 rows=1 width=4)
Index Cond: (id = posts_1.topic_id)
-> Index Scan using posts_pkey on posts (cost=0.44..7.97 rows=1 width=784)
Index Cond: (id = x.id)
JIT:
Functions: 21
Options: Inlining true, Optimization true, Expressions true, Deforming true
(20 rows)
Changing the limit doesn’t seem to alter the query plan much (costs or otherwise). The issue seems to be that the query has to sort the entire posts table (which in our case is about 26.5 million rows) before it can do the operation. There might an opportunity for an index here. I don’t see the score column included in any of the indexes on the posts table right now.
The ranking is per topic, it is not ranking the whole set.
You could perhaps gate on topic ids … WHERE topic_id < 1000 … 2000 … 10000 and so on… Probably once the initial update is done this will run faster.