PostgreSQL com E/S descontrolada

Olá a todos,

Tenho um problema e gostaria de contar com a ajuda de vocês. Tenho um servidor instalado com o Discourse usando o instalador baseado em Docker. Todos os dias, por volta do meio-dia, o servidor, por um motivo ainda desconhecido para mim, fica sobrecarregado e vejo a carga de E/S subir a ponto de o fórum começar a exibir erros 500.

O iotop sempre me diz que é um UPDATE do PostgreSQL que consome todo o E/S.

Então, hoje, quando isso aconteceu novamente, capturei uma lista de todas as consultas SQL em execução com este comando:
sudo -u postgres psql discourse -o /tmp/RunningQueries -c ‘SELECT datname,pid,state,query FROM pg_stat_activity’

A saída deste comando pode ser encontrada aqui no Pastebin.

Como podem ver, nesses momentos há cerca de 32 consultas UPDATE em execução. Quando isso acontece, o iotop me diz que o banco de dados realiza leituras e gravações de 2,5 MB/s a 15 MB/s.

Se correlacionar a consulta UPDATE em execução com a consulta registrada (em /var/discourse/shared/standalone/log/var-log/postgresql), vejo que essas consultas levam muito, muito tempo:

2018-07-03 12:51:27.052 UTC [17504] discourse@discourse LOG:  duration: 2352061.872 ms  statement: UPDATE drafts
                       SET  data = '{"reply":"<redacted for debugging purposes>","action":"reply","categoryId":24,"postId":118034,"archetypeId":"regular","whisper":false,"metaData":null,"composerTime":65922,"typingTime":8400}',

corresponde a

  discourse | 17504 | active | UPDATE drafts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  +
           |       |        |                SET  data = '{"reply":"<redacted for debugging purposes>","action":"reply","categoryId":24,"postId":118034,"archetypeId":"regular","whisper":false,"metaData":null,"composerTime":65922,"typingTime":8400}',                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    +
           |       |        |                     sequence = 124,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            +
           |       |        |                     revisions = revisions + 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  +
           |       |        |                WHERE id = 84548

Se reinicio o aplicativo Docker, as consultas retornam, então basicamente só posso esperar que passe, o que deixa meus usuários insatisfeitos.

Por favor, me avisem se há algo que eu possa fazer para melhorar essa situação — por exemplo, mover as tarefas de manutenção para por volta das 5h da manhã.

Se tiverem mais sugestões para que eu possa investigar mais a fundo, por favor, compartilhem! Qualquer ajuda é apreciada :slight_smile:

Edição Ninja: Acabei de lembrar de uma informação adicional: não consegui relacionar isso a jobs cron existentes (nem do sistema operacional host, nem dentro do aplicativo Docker).

Algumas informações de contexto sobre o servidor:

  • Sistema operacional do servidor: Ubuntu 18.04LTS
  • É uma VM com 100 GB de disco, 4 GB de RAM, 4 núcleos
  • Disco é um RAID10 de 6 discos com discos de 15K
  • Discourse v2.1.0.beta2 +107
  • Plugins: babble, whos-online, voting, cakeday, anonymous-categories, league
  • Visualizações de página por dia: Cerca de 75 mil
  • Usuários: 1.350, cerca de 700 ativos por dia
  • 73,6 mil posts, 351 tópicos, o tópico mais ativo contém 13,5 mil posts

Obrigado por ler até aqui! :slight_smile:

  • Are those spinning disks?

  • How big is your database?

I had an experience like this with the combo: active forum + spinning disks. In the end it was Redis and PostgreSQL fighting for IO and after a while redis would stop because it had multiple rdb saves in the queue to happen.

Can you please check if redis is saving to disk during the problem?

3 curtidas

They are SAS disks, 900GB 15K (spinning disks, not SSDs) configured in RAID10. They are not exactly slow, but you obviously wouldn’t get the IOPS you’d get from an SSD.

As for the database: the postgres_data directory is 4.5GB.

I doubt it’s redis in this case though - I stupidly enough forgot to grab a full screenshot from iotop but I can’t remember seeing redis in there. Did see postgres UPDATE at the top of the list with IO activity of 98% and up.

BTW: I’ll check tomorrow as I have no doubt it will happen again. And I’ll grab a screenshot from iotop.

I still don’t feel redis should be constantly clobbering the disk. It’s supposed to be an in memory key-value store most of all.

Hi, server’s very busy again - changed the timezone within the container from UTC to GMT+1 (Europe/Amsterdam) and indeed, load now starts 2 hours later.

Screenshot from iotop -o (which only lists the actual tasks doing I/O):

I’ve observed the behaviour and indeed, over the last 5 minutes, I did see Redis pop up twice, but only for a moment. Doesn’t come across as being the source of the problems to me.

I have a feeling there are some scheduled tasks which are being started, but I can’t really find the source. Checked cron jobs from both within the container and the underlying OS - no dice.

If you have any suggestions then those would be more than welcome! Also, if there’s more I can do regarding information gathering, please tell me! :slight_smile:

1 curtida

More likely Postgres then, if you have ruled Redis out.

I think so too - see the screenshot of the output of iostat - it’s a PostgreSQL UPDATE command that generates most of the IO load.

Thing is: what is triggered here? And why? Where can I find how (and when!) the postgresql maintenance tasks are triggered?

Edit: Additional info: Screenshot from sidekiq:


Looks like your problem is this one: Reschedule sidekiq scheduled job

You have one still active topic (last reply 19m ago) with 17k posts. That plus slow disks kill performance. The fact that your data is larger than the RAM also makes this more complicated.

5 curtidas

That’s the most popular topic indeed. Would it be better to break it up in pieces, close this one and open a new one?

Thanks for the pointer btw - I’ll have a look!

Yes. I set auto close topics post count to 1024 or 512 when I had an HDD instance.

Also, I completely forgot, but looks like I took an stab at this query 2 years ago:

5 curtidas

Surprising that you are running into this with 17k posts in one topic, I view 20k as the “starting to be a problem” cutoff. But maybe we should set the bar lower cc @tgxworld

It could be related to my specific setup though - if it helps i can document a bit more, run some additional tests etc (bonnie++ or so).

We have some public results about disk bench here:

Maybe run those (when the forum has low load) to compare?

Just as a thought exercise: what exactly would break if this specific job would not exist or run?

I can’t find any place in the code where Post.avg_time and Topic.avg_time are actually being used? Or is there maybe some generated accessor that I’m missing?

1 curtida

It’s used to compute the post score that we use to rank posts so we can summarize them.

3 curtidas

I have mentioned this particular issue before… it is totally unrelated to @tgxworld’s work.

I am still not buying that we are getting any value from “geometric mean of all read times for all posts”… a simple… which posts are read most is probably good enough for the “best of” algorithm.

Plus this whole concept is a bit broken imo cause the further you get to the end of a topic the less people read it.

I think a simple fix here, is to stop doing the whole geometric mean read time thing on topics with more than say 1000 posts. Which is oddly the only time we would use it… :frowning:

Silly idea perhaps: Topics have a certain flow, setup, experience to them. Some topics are by nature filled with short posts, others with long posts.

Would it be an idea to calculate the mean over the first 500 or 1000 posts and use that as an indication for all posts in the thread?

After all, it’s an indication so an approximation would be acceptable, no?

I don’t even know if this would help in our case. Fact is that post_timings is bigger than the internal memory of the server (20 GB vs 16 GB) and even limiting to the first x posts would still result in the database failing to pull the entire table into memory.

So my feeling is that attempting to do this query differently would work better than trying to “limit” it.

2 curtidas

Hi, I did run a test just now on the server (still relatively active with 30 active users online). These are the results:

19.3 k requests completed in 9.89 s, 75.5 MiB read, 1.95 k iops, 7.63 MiB/s
generated 19.3 k requests in 10.0 s, 75.5 MiB, 1.93 k iops, 7.55 MiB/s

Everything seems to be about 1/7th of the speed you get. Perhaps indeed I should consider switching from old-school HDD to SSDs.

3 curtidas

These numbers are terrifyingly bad even for traditional HDDs. Way below what fancy enterprise 10k and 15k drives should be doing.

4 curtidas