While working on a local container where I have imported data from a SMF2 forum with 20+ years of activity, I have stumbled across a showstopper bug with Topic.reset_all_highest
.
After the data import, my db shows around 60k regular topics and around 400k private_message topics, and the queries in Topic.reset_all_highest
cause some sort of geometric growth of the rows, with the result that my disk runs out of space (with 120Gb free to start with).
I am currently trying to split the queries in manageable chunks and running them directly in Postgres but that’s of course suboptimal (and I’m not sure that it even works and ends up having the correct results).
I have tried to see if anybody else had this kind of problem but I couldn’t find anything so I’m wondering if this could be somehow related to my own setup - I’m using the latest docker version, for the record.
I have done a modest sized import lately that is also hanging apparently indefinitely onm Topic.reset_all_highest
and I’ve had to kill the query in Postgres to continue. I haven’t had this problem before and thought that perhaps it was just that my postgres server was overloaded (it’s got a bunch of sites connected).
My next step was to move to another postgres server, but I haven’t gotten around to it yet.
After the first two bits of my “split query” experiment went smoothly (X and Y for public topics), I tried with the Z one, and it froze - that is, the query was active according to postgres activity, and top
showed the process running at 100%.
So I looked again at the SQL and I found the problem: both queries end like this
WHERE
topics.archetype <> 'private_message' AND
X.topic_id = topics.id AND
Y.topic_id = topics.id AND
(
topics.highest_staff_post_number <> X.highest_post_number OR
topics.highest_post_number <> Y.highest_post_number OR
topics.last_posted_at <> Y.last_posted_at OR
topics.posts_count <> Y.posts_count OR
topics.word_count <> Z.word_count
)
(the other has ‘private_message’ as archetype, of course)
Which means the query’s missing
Z.topic_id = topics.id
- which causes the whole geometric increase.
Changing the queries WHERE clause to
WHERE
topics.archetype <> 'private_message' AND
X.topic_id = topics.id AND
Y.topic_id = topics.id AND
Z.topic_id = topics.id AND
(
topics.highest_staff_post_number <> X.highest_post_number OR
topics.highest_post_number <> Y.highest_post_number OR
topics.last_posted_at <> Y.last_posted_at OR
topics.posts_count <> Y.posts_count OR
topics.word_count <> Z.word_count
)
fixed the problem for me.
Should I open a PR?
I’d think so. If you can find a commit that broke that, it’d be even more complelling.
I have opened a PR for this, with some unfortunate limitations (ie. I can’t imagine how to test this change).
This change looks correct to, merging it.
(test wise there should be coverage and a simple test would be plenty to validate it, we just need to confirm no regression)