Topic.reset_all_highest exhausts all available disk space

While working on a local container where I have imported data from a SMF2 forum with 20+ years of activity, I have stumbled across a showstopper bug with Topic.reset_all_highest.
After the data import, my db shows around 60k regular topics and around 400k private_message topics, and the queries in Topic.reset_all_highest cause some sort of geometric growth of the rows, with the result that my disk runs out of space (with 120Gb free to start with).
I am currently trying to split the queries in manageable chunks and running them directly in Postgres but thatā€™s of course suboptimal (and Iā€™m not sure that it even works and ends up having the correct results).
I have tried to see if anybody else had this kind of problem but I couldnā€™t find anything so Iā€™m wondering if this could be somehow related to my own setup - Iā€™m using the latest docker version, for the record.

1 Like

I have done a modest sized import lately that is also hanging apparently indefinitely onm Topic.reset_all_highest and Iā€™ve had to kill the query in Postgres to continue. I havenā€™t had this problem before and thought that perhaps it was just that my postgres server was overloaded (itā€™s got a bunch of sites connected).

My next step was to move to another postgres server, but I havenā€™t gotten around to it yet.

After the first two bits of my ā€œsplit queryā€ experiment went smoothly (X and Y for public topics), I tried with the Z one, and it froze - that is, the query was active according to postgres activity, and top showed the process running at 100%.
So I looked again at the SQL and I found the problem: both queries end like this

      WHERE
        topics.archetype <> 'private_message' AND
        X.topic_id = topics.id AND
        Y.topic_id = topics.id AND 
          (
          topics.highest_staff_post_number <> X.highest_post_number OR
          topics.highest_post_number <> Y.highest_post_number OR
          topics.last_posted_at <> Y.last_posted_at OR
          topics.posts_count <> Y.posts_count OR
          topics.word_count <> Z.word_count
        )

(the other has ā€˜private_messageā€™ as archetype, of course)
Which means the queryā€™s missing
Z.topic_id = topics.id - which causes the whole geometric increase.

Changing the queries WHERE clause to

      WHERE
        topics.archetype <> 'private_message' AND
        X.topic_id = topics.id AND
        Y.topic_id = topics.id AND 
        Z.topic_id = topics.id AND 
          (
          topics.highest_staff_post_number <> X.highest_post_number OR
          topics.highest_post_number <> Y.highest_post_number OR
          topics.last_posted_at <> Y.last_posted_at OR
          topics.posts_count <> Y.posts_count OR
          topics.word_count <> Z.word_count
        )

fixed the problem for me.

Should I open a PR?

2 Likes

Iā€™d think so. If you can find a commit that broke that, itā€™d be even more complelling.

1 Like

I have opened a PR for this, with some unfortunate limitations (ie. I canā€™t imagine how to test this change).

6 Likes

This change looks correct to, merging it.

(test wise there should be coverage and a simple test would be plenty to validate it, we just need to confirm no regression)

5 Likes