Topic.reset_all_highest exhausts all available disk space

While working on a local container where I have imported data from a SMF2 forum with 20+ years of activity, I have stumbled across a showstopper bug with Topic.reset_all_highest.
After the data import, my db shows around 60k regular topics and around 400k private_message topics, and the queries in Topic.reset_all_highest cause some sort of geometric growth of the rows, with the result that my disk runs out of space (with 120Gb free to start with).
I am currently trying to split the queries in manageable chunks and running them directly in Postgres but that’s of course suboptimal (and I’m not sure that it even works and ends up having the correct results).
I have tried to see if anybody else had this kind of problem but I couldn’t find anything so I’m wondering if this could be somehow related to my own setup - I’m using the latest docker version, for the record.

1 Like

I have done a modest sized import lately that is also hanging apparently indefinitely onm Topic.reset_all_highest and I’ve had to kill the query in Postgres to continue. I haven’t had this problem before and thought that perhaps it was just that my postgres server was overloaded (it’s got a bunch of sites connected).

My next step was to move to another postgres server, but I haven’t gotten around to it yet.

After the first two bits of my ā€œsplit queryā€ experiment went smoothly (X and Y for public topics), I tried with the Z one, and it froze - that is, the query was active according to postgres activity, and top showed the process running at 100%.
So I looked again at the SQL and I found the problem: both queries end like this

      WHERE
        topics.archetype <> 'private_message' AND
        X.topic_id = topics.id AND
        Y.topic_id = topics.id AND 
          (
          topics.highest_staff_post_number <> X.highest_post_number OR
          topics.highest_post_number <> Y.highest_post_number OR
          topics.last_posted_at <> Y.last_posted_at OR
          topics.posts_count <> Y.posts_count OR
          topics.word_count <> Z.word_count
        )

(the other has ā€˜private_message’ as archetype, of course)
Which means the query’s missing
Z.topic_id = topics.id - which causes the whole geometric increase.

Changing the queries WHERE clause to

      WHERE
        topics.archetype <> 'private_message' AND
        X.topic_id = topics.id AND
        Y.topic_id = topics.id AND 
        Z.topic_id = topics.id AND 
          (
          topics.highest_staff_post_number <> X.highest_post_number OR
          topics.highest_post_number <> Y.highest_post_number OR
          topics.last_posted_at <> Y.last_posted_at OR
          topics.posts_count <> Y.posts_count OR
          topics.word_count <> Z.word_count
        )

fixed the problem for me.

Should I open a PR?

2 Likes

I’d think so. If you can find a commit that broke that, it’d be even more complelling.

1 Like

I have opened a PR for this, with some unfortunate limitations (ie. I can’t imagine how to test this change).

6 Likes

This change looks correct to, merging it.

(test wise there should be coverage and a simple test would be plenty to validate it, we just need to confirm no regression)

5 Likes