PostgreSQL runaway IO

I’ve seen this before (on the linux-raid mailing list) when legacy partition tables and contents are copied onto modern 4k sector hard drives. (Default legacy partitions are not 4k -aligned.) The 512-byte sector support will generate read-modify-write cycles in the drive firmware as needed, and random access workloads on misaligned blocks will need lots of R-M-W.

On linux-raid, these discoveries are usually associated with raid array rebuild failures and catastrophic data loss. The poor performance is a hint that all is not alright with your storage stack.


You’re right (although in a slightly different way).

First of all, I am an idiot. Second of all, thank you all for the hints and suggestions.

Third and most importantly: I have a new-found hate for raid controllers which disable disk caches.

Enabled the disk caches, got the proper speeds.

Let’s see how big the real-world gains are though.



Sorry to open up this topic again but I have a question on the size of post_timings.

Out of curiousity I had a look at the database to see how big that table really is, so I checked it with:
select count(*) from post_timings;
and that came back with 25 million records, 25.101.376, to be specific.

The total number of posts in the system is 99.260.

Does it make sense that post_timings contains roughly 250 entries per post? Or is there something that went (a bit) wrong somewhere and is there stuff I can do to clean up?


post_timings will contain 1 entry for every user that reads a post. So if you have, on average, 250 users reading each post, then 25 million rows sounds about right.


Hi! Sorry to trouble you again over this but I have -yet again- a related question.

Would it perhaps be possible to calculate the relevant value for a specific topic once the topic is being archived (and then remove all those entries from the post_timings table?).

Some context:
I’ve got a forum here that’s been up for 2.5 months now, and we’re over 100K posts already, with close to 1400 (very active!) users.

Some of these topics tend to be revisited by very many of those users, so I’m afraid the system will eventually come to a grinding halt if i’m not careful.

Some data: 25M records in the post_timings table. The top-10 topics (out of little over 400 topics) are good for 16.7M entries.

What we’re doing now is close topics when they reach 5.000 posts (the most active topic reaches this after around 10 days). And then we open a new topic. As you can imagine, these topics are used for very active social conversations and sometimes more resemble a chat than a forum.

When we close a topic the users move on to the next one (and hardly ever come back to the previous one - it’s basically an archive from then on). So we archive the old topic. I would happily accept it if we were to make one final calculation of the average reading time of the archived topic (so that the other functionality doesn’t break), then remove all entries for the archived topic bar one (which contains the final calculated value) and thereby significantly reduce the size of the table.

We could even do the same thing but per user if needed.

In fact: If the most important reasons for this table are to be able to calculate how much time a specific user has spent reading then perhaps we could make it a configurable option? I know that funnily enough our users are not really interested in that (in fact, it’s a bit of a shocker for some that they spend so much time on the forum!)

Would this work?

Are you actually seeing any issues caused by the number of rows in your post_timings table? Pruning the table might be possible, but would likely have unintended side effects.


Well, I am still seeing issues with the CalculateAvgTime query which doesn’t seem to want to finish. I first thought it was related to reduced performance of my underlying IO storage, but indeed - I moved it to an SSD-based test environment and it still seems to not want to finish.

I have to admit, I’m just trying to see what I can do to improve life here, so any and all advice is welcome. Then again: if this is a case of the ‘sorry, can’t fix’ then I’ll accept that as well.

Btw: for the administrative side of things, would it be possible to setup a read-only slave which does the bulk of the processing for all maintenance bits? Because I could then move that to its own working environment, and not have it impact performance of the main forum.

You absolutely need to prevent megatopics from forming. Any topic approaching 10k replies is an accident waiting to happen. The more you have of these, the worse it will get, and the more replies, the worse the overall effect.

1 Like

We indeed learned from this and will close topics when they grow to 5000 posts.
Question is: what can I do with the one archived 17.0000-post topic?

Can I break it into smaller pieces? Does it help when I delete the associated entries from then post_timings table?

Or should I just throw away that topic altogether?

Edit: I think the size of the post_timings table now also starts to affect the BadgeGrant query which seems to be not too happy.

Update: I’ve taken the plunge and moved a lot of entries into new topics. The biggest topic is now on longer 17K posts, but 7K.

Still, no dice.

The CalculateAvgTime and BadgeGrant now both seem to keep on running forever.

a select count(*) from post_timings; tells me that post_timings is now 29 million records large, and I think that has to do with at least some of the problems.

From the BadgeGrant query I ran the innermost bit by hand:

SELECT pt.user_id                                                         
     FROM post_timings pt                                                      
     JOIN badge_posts b ON b.post_number = pt.post_number AND                  
                           b.topic_id = pt.topic_id                            
     JOIN topics t ON = pt.topic_id                                       
     LEFT JOIN user_badges ub ON ub.badge_id = 17 AND ub.user_id = pt.user_id  
     WHERE IS NULL AND t.posts_count > 100                               
     GROUP BY pt.user_id, pt.topic_id, t.posts_count                           
     HAVING count(*) >= t.posts_count

and indeed I get the same results - taking forever. And with forever I mean I let it run for at least 20 minutes before I interrupted this query.

My biggest concern is that this will only worsen due to the fact that this is a rather active forum and we’ve only been underway for 2.5 months.

Is there any suggestion you have that could alleviate this situation?

It is inherently extremely expensive to figure out if a user read EVERY single post on a topic across all topics across all users.

If you don’t have the hardware to handle that I would recommend just disabling the reader badge.

I am on the fence big time on the value of “calculateavgtime” I am not convinced we even need this long term


Hi Sam,

Thanks for the suggestion - I’ve disabled the reader badge, will do some more testing to see if indeed BadgeGrant now works again for the other badges.

If there’s anything else I can do, or test or so, please let me know :slight_smile: . I don’t just want to take back, if there’s things I can do that may help then I’m more than happy to do so.


Just a quick update: BadgeGrant works again, so THANK YOU!

The Jobs::Weekly still triggers the calculation of avg_time based on stuff from post_timings, so that’s not very happy yet, but at least something’s started working again :wink:

I do wonder what else there is in the Weekly job, because if I kill the query that keeps on running forever, then the rest of that job probably will not be executed either.

1 Like

Denormalizing & pruning the post_timings table also came up during GDPR work, and I wasn’t really able to come up with a coherent plan of action there either. The (user, post, duration) really does get used individually and without a solid time cutoff, so you can’t just truncate the table and expect that you’re only purging old entries.


Is there any other low hanging fruit left out there as far as removing IP address from tables where we were needlessly collecting it?