PostgreSQL runaway IO

pturmel · July 6, 2018, 3:15pm

I’ve seen this before (on the linux-raid mailing list) when legacy partition tables and contents are copied onto modern 4k sector hard drives. (Default legacy partitions are not 4k -aligned.) The 512-byte sector support will generate read-modify-write cycles in the drive firmware as needed, and random access workloads on misaligned blocks will need lots of R-M-W.

On linux-raid, these discoveries are usually associated with raid array rebuild failures and catastrophic data loss. The poor performance is a hint that all is not alright with your storage stack.

JanJoost · July 6, 2018, 3:36pm

You’re right (although in a slightly different way).

First of all, I am an idiot. Second of all, thank you all for the hints and suggestions.

Third and most importantly: I have a new-found hate for raid controllers which disable disk caches.

Enabled the disk caches, got the proper speeds.

Let’s see how big the real-world gains are though.

JanJoost · July 23, 2018, 11:47am

Hi,

Sorry to open up this topic again but I have a question on the size of post_timings.

Out of curiousity I had a look at the database to see how big that table really is, so I checked it with:
select count(*) from post_timings;
and that came back with 25 million records, 25.101.376, to be specific.

The total number of posts in the system is 99.260.

Does it make sense that post_timings contains roughly 250 entries per post? Or is there something that went (a bit) wrong somewhere and is there stuff I can do to clean up?

Thanks!

david · July 23, 2018, 11:57am

post_timings will contain 1 entry for every user that reads a post. So if you have, on average, 250 users reading each post, then 25 million rows sounds about right.

JanJoost · July 30, 2018, 8:39pm

Hi! Sorry to trouble you again over this but I have -yet again- a related question.

Would it perhaps be possible to calculate the relevant value for a specific topic once the topic is being archived (and then remove all those entries from the post_timings table?).

Some context:
I’ve got a forum here that’s been up for 2.5 months now, and we’re over 100K posts already, with close to 1400 (very active!) users.

Some of these topics tend to be revisited by very many of those users, so I’m afraid the system will eventually come to a grinding halt if i’m not careful.

Some data: 25M records in the post_timings table. The top-10 topics (out of little over 400 topics) are good for 16.7M entries.

What we’re doing now is close topics when they reach 5.000 posts (the most active topic reaches this after around 10 days). And then we open a new topic. As you can imagine, these topics are used for very active social conversations and sometimes more resemble a chat than a forum.

When we close a topic the users move on to the next one (and hardly ever come back to the previous one - it’s basically an archive from then on). So we archive the old topic. I would happily accept it if we were to make one final calculation of the average reading time of the archived topic (so that the other functionality doesn’t break), then remove all entries for the archived topic bar one (which contains the final calculated value) and thereby significantly reduce the size of the table.

We could even do the same thing but per user if needed.

In fact: If the most important reasons for this table are to be able to calculate how much time a specific user has spent reading then perhaps we could make it a configurable option? I know that funnily enough our users are not really interested in that (in fact, it’s a bit of a shocker for some that they spend so much time on the forum!)

Would this work?

david · July 30, 2018, 9:05pm

Are you actually seeing any issues caused by the number of rows in your post_timings table? Pruning the table might be possible, but would likely have unintended side effects.

JanJoost · July 30, 2018, 9:11pm

Well, I am still seeing issues with the CalculateAvgTime query which doesn’t seem to want to finish. I first thought it was related to reduced performance of my underlying IO storage, but indeed - I moved it to an SSD-based test environment and it still seems to not want to finish.

I have to admit, I’m just trying to see what I can do to improve life here, so any and all advice is welcome. Then again: if this is a case of the ‘sorry, can’t fix’ then I’ll accept that as well.

Btw: for the administrative side of things, would it be possible to setup a read-only slave which does the bulk of the processing for all maintenance bits? Because I could then move that to its own working environment, and not have it impact performance of the main forum.

codinghorror · July 30, 2018, 11:48pm

You absolutely need to prevent megatopics from forming. Any topic approaching 10k replies is an accident waiting to happen. The more you have of these, the worse it will get, and the more replies, the worse the overall effect.

JanJoost · July 31, 2018, 3:36am

We indeed learned from this and will close topics when they grow to 5000 posts.
Question is: what can I do with the one archived 17.0000-post topic?

Can I break it into smaller pieces? Does it help when I delete the associated entries from then post_timings table?

Or should I just throw away that topic altogether?

Edit: I think the size of the post_timings table now also starts to affect the BadgeGrant query which seems to be not too happy.

JanJoost · July 31, 2018, 4:58am

Update: I’ve taken the plunge and moved a lot of entries into new topics. The biggest topic is now on longer 17K posts, but 7K.

Still, no dice.

The CalculateAvgTime and BadgeGrant now both seem to keep on running forever.

a select count(*) from post_timings; tells me that post_timings is now 29 million records large, and I think that has to do with at least some of the problems.

From the BadgeGrant query I ran the innermost bit by hand:

SELECT pt.user_id                                                         
     FROM post_timings pt                                                      
     JOIN badge_posts b ON b.post_number = pt.post_number AND                  
                           b.topic_id = pt.topic_id                            
     JOIN topics t ON t.id = pt.topic_id                                       
     LEFT JOIN user_badges ub ON ub.badge_id = 17 AND ub.user_id = pt.user_id  
     WHERE ub.id IS NULL AND t.posts_count > 100                               
     GROUP BY pt.user_id, pt.topic_id, t.posts_count                           
     HAVING count(*) >= t.posts_count

and indeed I get the same results - taking forever. And with forever I mean I let it run for at least 20 minutes before I interrupted this query.

My biggest concern is that this will only worsen due to the fact that this is a rather active forum and we’ve only been underway for 2.5 months.

Is there any suggestion you have that could alleviate this situation?

sam · July 31, 2018, 5:14am

It is inherently extremely expensive to figure out if a user read EVERY single post on a topic across all topics across all users.

If you don’t have the hardware to handle that I would recommend just disabling the reader badge.

I am on the fence big time on the value of “calculateavgtime” I am not convinced we even need this long term

JanJoost · July 31, 2018, 5:18am

Hi Sam,

Thanks for the suggestion - I’ve disabled the reader badge, will do some more testing to see if indeed BadgeGrant now works again for the other badges.

If there’s anything else I can do, or test or so, please let me know . I don’t just want to take back, if there’s things I can do that may help then I’m more than happy to do so.

JanJoost · July 31, 2018, 5:33am

Just a quick update: BadgeGrant works again, so THANK YOU!

The Jobs::Weekly still triggers the calculation of avg_time based on stuff from post_timings, so that’s not very happy yet, but at least something’s started working again

I do wonder what else there is in the Weekly job, because if I kill the query that keeps on running forever, then the rest of that job probably will not be executed either.

riking · July 31, 2018, 6:22pm

Denormalizing & pruning the post_timings table also came up during GDPR work, and I wasn’t really able to come up with a coherent plan of action there either. The (user, post, duration) really does get used individually and without a solid time cutoff, so you can’t just truncate the table and expect that you’re only purging old entries.

codinghorror · July 31, 2018, 10:50pm

Is there any other low hanging fruit left out there as far as removing IP address from tables where we were needlessly collecting it?

sam · March 3, 2021, 12:06am

Howdy @JanJoost … curious where you are sitting on this issue many years later?

sam · March 17, 2021, 8:00am

This topic was automatically closed after 14 days. New replies are no longer allowed.

Topic		Replies	Views
Improving Instance Performance (Megatopics, Database Size and Extreme Load) Installation	60	4821	October 13, 2020
Long loading times for user summary page with slow database Installation unsupported-install	19	1749	July 13, 2023
Long-Running Sidekiq Jobs Feature	21	1561	December 24, 2020
Post.calculate_avg_time() taking up a long time Support	28	2726	May 6, 2019
Slow Profile Loads with 100GB+ database Feature	43	2860	January 1, 2021

PostgreSQL runaway IO

Related topics