Long-Running Sidekiq Jobs

It still has to run the subquery to calculate the ranks on every topic across the entire posts table no matter how many ranks are actually being updated.

This subquery takes about 40 seconds to run, and then it is inside a nested loop that has to run for the join before we even get to the WHERE clause. That seems to be what is blowing up the query cost.

SELECT id, percent_rank()                                                     
    OVER (PARTITION BY topic_id ORDER BY SCORE DESC) as percent_rank 
FROM posts
1 Like

I looked at the code and now I understand. This query is run repeatedly until it clears out all the posts whose rank needs to be updated, so when I was watching the SQL processes, I was seeing many different runs of this query over the course of the hours as it finished one run and started another.

However, it doesn’t seem like the LIMIT really helps the complexity of the query all that much. In fact, I’ve discovered that if I increase the limit to 200000, the query plan actually improves significantly.
With 20000, I see a huge nested loop with cost nearly 200 million:

-> Nested Loop (cost=6033694.44..190270554.32 rows=26680560 width=16)

If I just increase the limit to 200000, then it produces this query plan, which looks much cleaner:

                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Update on posts  (cost=7334394.89..8873863.82 rows=200000 width=833)
   ->  Nested Loop  (cost=7334394.89..8873863.82 rows=200000 width=833)
         ->  Subquery Scan on x  (cost=7334394.45..7370147.82 rows=200000 width=48)
               ->  Limit  (cost=7334394.45..7368147.82 rows=200000 width=12)
                     ->  Hash Join  (cost=7334394.45..11837188.36 rows=26680560 width=12)
                           Hash Cond: (posts_1.id = y.id)
                           Join Filter: ((posts_1.percent_rank IS NULL) OR (y.percent_rank <> posts_1.percent_rank))
                           ->  Hash Join  (cost=35403.20..2845221.73 rows=26703700 width=12)
                                 Hash Cond: (posts_1.topic_id = topics.id)
                                 ->  Seq Scan on posts posts_1  (cost=0.00..2739721.00 rows=26703700 width=16)
                                 ->  Hash  (cost=23918.11..23918.11 rows=918807 width=4)
                                       ->  Index Only Scan using topics_pkey on topics  (cost=0.42..23918.11 rows=918807 width=4)
                           ->  Hash  (cost=6834805.00..6834805.00 rows=26703700 width=12)
                                 ->  Subquery Scan on y  (cost=6033694.00..6834805.00 rows=26703700 width=12)
                                       ->  WindowAgg  (cost=6033694.00..6567768.00 rows=26703700 width=24)
                                             ->  Sort  (cost=6033694.00..6100453.25 rows=26703700 width=16)
                                                   Sort Key: posts_2.topic_id, posts_2.score DESC
                                                   ->  Seq Scan on posts posts_2  (cost=0.00..2739721.00 rows=26703700 width=16)
         ->  Index Scan using posts_pkey on posts  (cost=0.44..7.52 rows=1 width=789)
               Index Cond: (id = x.id)
 JIT:
   Functions: 30
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(23 rows)

I wonder what causes it to switch over. It may be worth increasing this at least after a big import like we’ve done while it clears out the backlog.

2 Likes