I’ve seen this before (on the linux-raid mailing list) when legacy partition tables and contents are copied onto modern 4k sector hard drives. (Default legacy partitions are not 4k -aligned.) The 512-byte sector support will generate read-modify-write cycles in the drive firmware as needed, and random access workloads on misaligned blocks will need lots of R-M-W.
On linux-raid, these discoveries are usually associated with raid array rebuild failures and catastrophic data loss. The poor performance is a hint that all is not alright with your storage stack.
Sorry to open up this topic again but I have a question on the size of post_timings.
Out of curiousity I had a look at the database to see how big that table really is, so I checked it with: select count(*) from post_timings;
and that came back with 25 million records, 25.101.376, to be specific.
The total number of posts in the system is 99.260.
Does it make sense that post_timings contains roughly 250 entries per post? Or is there something that went (a bit) wrong somewhere and is there stuff I can do to clean up?
post_timings will contain 1 entry for every user that reads a post. So if you have, on average, 250 users reading each post, then 25 million rows sounds about right.
Hi! Sorry to trouble you again over this but I have -yet again- a related question.
Would it perhaps be possible to calculate the relevant value for a specific topic once the topic is being archived (and then remove all those entries from the post_timings table?).
Some context:
I’ve got a forum here that’s been up for 2.5 months now, and we’re over 100K posts already, with close to 1400 (very active!) users.
Some of these topics tend to be revisited by very many of those users, so I’m afraid the system will eventually come to a grinding halt if i’m not careful.
Some data: 25M records in the post_timings table. The top-10 topics (out of little over 400 topics) are good for 16.7M entries.
What we’re doing now is close topics when they reach 5.000 posts (the most active topic reaches this after around 10 days). And then we open a new topic. As you can imagine, these topics are used for very active social conversations and sometimes more resemble a chat than a forum.
When we close a topic the users move on to the next one (and hardly ever come back to the previous one - it’s basically an archive from then on). So we archive the old topic. I would happily accept it if we were to make one final calculation of the average reading time of the archived topic (so that the other functionality doesn’t break), then remove all entries for the archived topic bar one (which contains the final calculated value) and thereby significantly reduce the size of the table.
We could even do the same thing but per user if needed.
In fact: If the most important reasons for this table are to be able to calculate how much time a specific user has spent reading then perhaps we could make it a configurable option? I know that funnily enough our users are not really interested in that (in fact, it’s a bit of a shocker for some that they spend so much time on the forum!)
Are you actually seeing any issues caused by the number of rows in your post_timings table? Pruning the table might be possible, but would likely have unintended side effects.
Well, I am still seeing issues with the CalculateAvgTime query which doesn’t seem to want to finish. I first thought it was related to reduced performance of my underlying IO storage, but indeed - I moved it to an SSD-based test environment and it still seems to not want to finish.
I have to admit, I’m just trying to see what I can do to improve life here, so any and all advice is welcome. Then again: if this is a case of the ‘sorry, can’t fix’ then I’ll accept that as well.
Btw: for the administrative side of things, would it be possible to setup a read-only slave which does the bulk of the processing for all maintenance bits? Because I could then move that to its own working environment, and not have it impact performance of the main forum.
You absolutely need to prevent megatopics from forming. Any topic approaching 10k replies is an accident waiting to happen. The more you have of these, the worse it will get, and the more replies, the worse the overall effect.
Update: I’ve taken the plunge and moved a lot of entries into new topics. The biggest topic is now on longer 17K posts, but 7K.
Still, no dice.
The CalculateAvgTime and BadgeGrant now both seem to keep on running forever.
a select count(*) from post_timings; tells me that post_timings is now 29 million records large, and I think that has to do with at least some of the problems.
From the BadgeGrant query I ran the innermost bit by hand:
SELECT pt.user_id
FROM post_timings pt
JOIN badge_posts b ON b.post_number = pt.post_number AND
b.topic_id = pt.topic_id
JOIN topics t ON t.id = pt.topic_id
LEFT JOIN user_badges ub ON ub.badge_id = 17 AND ub.user_id = pt.user_id
WHERE ub.id IS NULL AND t.posts_count > 100
GROUP BY pt.user_id, pt.topic_id, t.posts_count
HAVING count(*) >= t.posts_count
and indeed I get the same results - taking forever. And with forever I mean I let it run for at least 20 minutes before I interrupted this query.
My biggest concern is that this will only worsen due to the fact that this is a rather active forum and we’ve only been underway for 2.5 months.
Is there any suggestion you have that could alleviate this situation?
Thanks for the suggestion - I’ve disabled the reader badge, will do some more testing to see if indeed BadgeGrant now works again for the other badges.
If there’s anything else I can do, or test or so, please let me know . I don’t just want to take back, if there’s things I can do that may help then I’m more than happy to do so.
Just a quick update: BadgeGrant works again, so THANK YOU!
The Jobs::Weekly still triggers the calculation of avg_time based on stuff from post_timings, so that’s not very happy yet, but at least something’s started working again
I do wonder what else there is in the Weekly job, because if I kill the query that keeps on running forever, then the rest of that job probably will not be executed either.
Denormalizing & pruning the post_timings table also came up during GDPR work, and I wasn’t really able to come up with a coherent plan of action there either. The (user, post, duration) really does get used individually and without a solid time cutoff, so you can’t just truncate the table and expect that you’re only purging old entries.