PostgreSQL runaway IO

JanJoost · July 3, 2018, 9:59pm

Hi all,

I’ve got a problem I’d like to ask your help with. I’ve got a server installed with Discourse using the Docker-based installer. Every day around noon the server for an as of yet unknown reason to me becomes busy and I see IO load rise up to the point where the forum starts to show error 500’s.

iotop always tells me it’s an PostgreSQL UPDATE that eats up all IO.

So today when it happened again I grabbed a list of all running SQL queries with this command:
sudo -u postgres psql discourse -o /tmp/RunningQueries -c ‘SELECT datname,pid,state,query FROM pg_stat_activity’

The output of this command can be found here on Pastebin.

As you can see, there are at these times around 32 UPDATE queries running. When this happens, iotop tells me the database reads and writes anywhere from 2.5MB/sec up to 15MB/sec.

If I correlate the running UPDATE query with the logged query (in /var/discourse/shared/standalone/log/var-log/postgresql) I do see that those queries take very very long:

2018-07-03 12:51:27.052 UTC [17504] discourse@discourse LOG:  duration: 2352061.872 ms  statement: UPDATE drafts
                       SET  data = '{"reply":"<redacted for debugging purposes>","action":"reply","categoryId":24,"postId":118034,"archetypeId":"regular","whisper":false,"metaData":null,"composerTime":65922,"typingTime":8400}',

correlates with

  discourse | 17504 | active | UPDATE drafts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  +
           |       |        |                SET  data = '{"reply":"<redacted for debugging purposes>","action":"reply","categoryId":24,"postId":118034,"archetypeId":"regular","whisper":false,"metaData":null,"composerTime":65922,"typingTime":8400}',                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    +
           |       |        |                     sequence = 124,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            +
           |       |        |                     revisions = revisions + 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  +
           |       |        |                WHERE id = 84548

If I restart the docker app, the queries do return so basically all I can do is wait it out which makes my users unhappy.

Please let me know if there is anything I can do to make this less bad - for instance move maintenance tasks to somewhere around 5AM or so.

If you have any more suggestions for me to dig deeper, please share! Any help is appreciated

Ninja-edit: Just remembered some additional information: I couldn’t relate this to existing cronjobs (neither from the host-OS nor from within the docker app).

Some background information on the server:

Server OS is Ubuntu 18.04LTS
It’s a VM with 100GB disk, 4GB ram, 4 cores
Disk is a 6-disk RAID10 with 15K disks
Discourse v2.1.0.beta2 +107
Plugins: babble, whos-online, voting, cakeday, anonymous-categories, league
Pageviews per day: Around 75K
Users: 1350, around 700/day active
73.6K posts, 351 topics, most active topic contains 13.5K posts

Thanks for reading this far!

Falco · July 3, 2018, 10:06pm

Are those spinning disks?
How big is your database?

I had an experience like this with the combo: active forum + spinning disks. In the end it was Redis and PostgreSQL fighting for IO and after a while redis would stop because it had multiple rdb saves in the queue to happen.

Can you please check if redis is saving to disk during the problem?

JanJoost · July 3, 2018, 10:13pm

They are SAS disks, 900GB 15K (spinning disks, not SSDs) configured in RAID10. They are not exactly slow, but you obviously wouldn’t get the IOPS you’d get from an SSD.

As for the database: the postgres_data directory is 4.5GB.

I doubt it’s redis in this case though - I stupidly enough forgot to grab a full screenshot from iotop but I can’t remember seeing redis in there. Did see postgres UPDATE at the top of the list with IO activity of 98% and up.

BTW: I’ll check tomorrow as I have no doubt it will happen again. And I’ll grab a screenshot from iotop.

codinghorror · July 3, 2018, 10:24pm

I still don’t feel redis should be constantly clobbering the disk. It’s supposed to be an in memory key-value store most of all.

JanJoost · July 4, 2018, 12:17pm

Hi, server’s very busy again - changed the timezone within the container from UTC to GMT+1 (Europe/Amsterdam) and indeed, load now starts 2 hours later.

Screenshot from iotop -o (which only lists the actual tasks doing I/O):

I’ve observed the behaviour and indeed, over the last 5 minutes, I did see Redis pop up twice, but only for a moment. Doesn’t come across as being the source of the problems to me.

I have a feeling there are some scheduled tasks which are being started, but I can’t really find the source. Checked cron jobs from both within the container and the underlying OS - no dice.

If you have any suggestions then those would be more than welcome! Also, if there’s more I can do regarding information gathering, please tell me!

codinghorror · July 4, 2018, 12:35pm

More likely Postgres then, if you have ruled Redis out.

JanJoost · July 4, 2018, 12:42pm

I think so too - see the screenshot of the output of iostat - it’s a PostgreSQL UPDATE command that generates most of the IO load.

Thing is: what is triggered here? And why? Where can I find how (and when!) the postgresql maintenance tasks are triggered?

Edit: Additional info: Screenshot from sidekiq:

Falco · July 4, 2018, 1:13pm

Looks like your problem is this one: Reschedule sidekiq scheduled job

You have one still active topic (last reply 19m ago) with 17k posts. That plus slow disks kill performance. The fact that your data is larger than the RAM also makes this more complicated.

JanJoost · July 4, 2018, 1:16pm

That’s the most popular topic indeed. Would it be better to break it up in pieces, close this one and open a new one?

Thanks for the pointer btw - I’ll have a look!

Falco · July 4, 2018, 1:21pm

Yes. I set auto close topics post count to 1024 or 512 when I had an HDD instance.

Also, I completely forgot, but looks like I took an stab at this query 2 years ago:

codinghorror · July 4, 2018, 1:27pm

Surprising that you are running into this with 17k posts in one topic, I view 20k as the “starting to be a problem” cutoff. But maybe we should set the bar lower cc @tgxworld

JanJoost · July 4, 2018, 1:34pm

It could be related to my specific setup though - if it helps i can document a bit more, run some additional tests etc (bonnie++ or so).

Falco · July 4, 2018, 3:15pm

We have some public results about disk bench here:

Maybe run those (when the forum has low load) to compare?

RGJ · July 4, 2018, 4:05pm

Just as a thought exercise: what exactly would break if this specific job would not exist or run?

I can’t find any place in the code where Post.avg_time and Topic.avg_time are actually being used? Or is there maybe some generated accessor that I’m missing?

zogstrip · July 4, 2018, 4:42pm

It’s used to compute the post score that we use to rank posts so we can summarize them.

sam · July 5, 2018, 12:58am

I have mentioned this particular issue before… it is totally unrelated to @tgxworld’s work.

I am still not buying that we are getting any value from “geometric mean of all read times for all posts”… a simple… which posts are read most is probably good enough for the “best of” algorithm.

Plus this whole concept is a bit broken imo cause the further you get to the end of a topic the less people read it.

I think a simple fix here, is to stop doing the whole geometric mean read time thing on topics with more than say 1000 posts. Which is oddly the only time we would use it…

JanJoost · July 5, 2018, 10:13am

Silly idea perhaps: Topics have a certain flow, setup, experience to them. Some topics are by nature filled with short posts, others with long posts.

Would it be an idea to calculate the mean over the first 500 or 1000 posts and use that as an indication for all posts in the thread?

After all, it’s an indication so an approximation would be acceptable, no?

RGJ · July 5, 2018, 3:46pm

I don’t even know if this would help in our case. Fact is that post_timings is bigger than the internal memory of the server (20 GB vs 16 GB) and even limiting to the first x posts would still result in the database failing to pull the entire table into memory.

So my feeling is that attempting to do this query differently would work better than trying to “limit” it.

JanJoost · July 5, 2018, 10:56pm

Hi, I did run a test just now on the server (still relatively active with 30 active users online). These are the results:

19.3 k requests completed in 9.89 s, 75.5 MiB read, 1.95 k iops, 7.63 MiB/s
generated 19.3 k requests in 10.0 s, 75.5 MiB, 1.93 k iops, 7.55 MiB/s

Everything seems to be about 1/7th of the speed you get. Perhaps indeed I should consider switching from old-school HDD to SSDs.

codinghorror · July 6, 2018, 1:29am

These numbers are terrifyingly bad even for traditional HDDs. Way below what fancy enterprise 10k and 15k drives should be doing.

Topic		Replies	Views
Improving Instance Performance (Megatopics, Database Size and Extreme Load) Installation	60	4826	October 13, 2020
Long loading times for user summary page with slow database Installation unsupported-install	19	1750	July 13, 2023
Long-Running Sidekiq Jobs Feature	21	1572	December 24, 2020
Post.calculate_avg_time() taking up a long time Support	28	2726	May 6, 2019
Slow Profile Loads with 100GB+ database Feature	43	2862	January 1, 2021

PostgreSQL runaway IO

Related topics