It still has to run the subquery to calculate the ranks on every topic across the entire posts table no matter how many ranks are actually being updated.
This subquery takes about 40 seconds to run, and then it is inside a nested loop that has to run for the join before we even get to the WHERE clause. That seems to be what is blowing up the query cost.
SELECT id, percent_rank()
OVER (PARTITION BY topic_id ORDER BY SCORE DESC) as percent_rank
FROM posts
I looked at the code and now I understand. This query is run repeatedly until it clears out all the posts whose rank needs to be updated, so when I was watching the SQL processes, I was seeing many different runs of this query over the course of the hours as it finished one run and started another.
However, it doesn’t seem like the LIMIT really helps the complexity of the query all that much. In fact, I’ve discovered that if I increase the limit to 200000, the query plan actually improves significantly.
With 20000, I see a huge nested loop with cost nearly 200 million: