While working on a local container where I have imported data from a SMF2 forum with 20+ years of activity, I have stumbled across a showstopper bug with Topic.reset_all_highest
.
After the data import, my db shows around 60k regular topics and around 400k private_message topics, and the queries in Topic.reset_all_highest
cause some sort of geometric growth of the rows, with the result that my disk runs out of space (with 120Gb free to start with).
I am currently trying to split the queries in manageable chunks and running them directly in Postgres but thatās of course suboptimal (and Iām not sure that it even works and ends up having the correct results).
I have tried to see if anybody else had this kind of problem but I couldnāt find anything so Iām wondering if this could be somehow related to my own setup - Iām using the latest docker version, for the record.
I have done a modest sized import lately that is also hanging apparently indefinitely onm Topic.reset_all_highest
and Iāve had to kill the query in Postgres to continue. I havenāt had this problem before and thought that perhaps it was just that my postgres server was overloaded (itās got a bunch of sites connected).
My next step was to move to another postgres server, but I havenāt gotten around to it yet.
After the first two bits of my āsplit queryā experiment went smoothly (X and Y for public topics), I tried with the Z one, and it froze - that is, the query was active according to postgres activity, and top
showed the process running at 100%.
So I looked again at the SQL and I found the problem: both queries end like this
WHERE
topics.archetype <> 'private_message' AND
X.topic_id = topics.id AND
Y.topic_id = topics.id AND
(
topics.highest_staff_post_number <> X.highest_post_number OR
topics.highest_post_number <> Y.highest_post_number OR
topics.last_posted_at <> Y.last_posted_at OR
topics.posts_count <> Y.posts_count OR
topics.word_count <> Z.word_count
)
(the other has āprivate_messageā as archetype, of course)
Which means the queryās missing
Z.topic_id = topics.id
- which causes the whole geometric increase.
Changing the queries WHERE clause to
WHERE
topics.archetype <> 'private_message' AND
X.topic_id = topics.id AND
Y.topic_id = topics.id AND
Z.topic_id = topics.id AND
(
topics.highest_staff_post_number <> X.highest_post_number OR
topics.highest_post_number <> Y.highest_post_number OR
topics.last_posted_at <> Y.last_posted_at OR
topics.posts_count <> Y.posts_count OR
topics.word_count <> Z.word_count
)
fixed the problem for me.
Should I open a PR?
Iād think so. If you can find a commit that broke that, itād be even more complelling.
I have opened a PR for this, with some unfortunate limitations (ie. I canāt imagine how to test this change).
This change looks correct to, merging it.
(test wise there should be coverage and a simple test would be plenty to validate it, we just need to confirm no regression)