Rebake aborta com mensagem de erro

Olá!
Quando tento refazer as postagens com uma tarefa rake, às vezes ela para com esta mensagem:

168858 / 329447 ( 51.3%)AVISO: conexão sendo encerrada devido à falha de outro processo do servidor DETALHE: O postmaster comandou este processo de servidor para reverter a transação atual e sair, porque outro processo de servidor saiu anormalmente e possivelmente corrompeu a memória compartilhada. DICA: Em um momento você deverá ser capaz de se reconectar ao banco de dados e repetir seu comando.

E no final de um monte de texto, contendo coisas como:

/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:828:in `exec_queries'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:631:in `load'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:249:in `records'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation/delegation.rb:88:in `each'
/var/www/discourse/lib/tasks/posts.rake:124:in `block in rebake_posts'

Há isto:
168859 / 329447 ( 51.3%)rake abortado! ActiveRecord::ConnectionNotEstablished: falha ao conectar ao servidor no socket \"/var/run/postgresql/.s.PGSQL.5432\": FATAL: o sistema de banco de dados está em modo de recuperação


Erro nos logs do Discourse:

Alguma ideia de como solucionar isso?

Você deve verificar os logs do seu postgresql.

1 curtida

Thanks!

I admit that this PostgreSQL is a whole new universe to me and I’m not familiar with it at all.
Can you confirm that logs are stored in /var/discourse/shared/standalone/log/var-log/postgres or should I look for a specific error log somewhere?
I looked at the current file in this folder, but couldn’t find anything suspicious, especially at the time shown by Discourse’s interface logs.

But it also is very possible that I don’t know what to look for.


edit : found this in the current file:

	
2022-03-12 05:27:51.651 UTC [450818] discourse@discourse LOG:  duration: 258.470 ms  parse <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:52.287 UTC [450818] discourse@discourse LOG:  duration: 557.420 ms  bind <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:52.500 UTC [450818] discourse@discourse LOG:  duration: 158.213 ms  execute <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:53.047 UTC [450818] discourse@discourse LOG:  duration: 301.290 ms  parse <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:56.647 UTC [560] LOG:  checkpointer process (PID 180496) was terminated by signal 9: Killed
2022-03-12 05:27:56.650 UTC [560] LOG:  terminating any other active server processes
2022-03-12 05:27:56.652 UTC [455579] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.652 UTC [455579] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.652 UTC [455579] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.652 UTC [455580] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.652 UTC [455580] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.652 UTC [455580] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455573] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455573] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455573] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455560] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455560] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455560] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455476] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455476] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455476] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [180506] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [180506] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [180506] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455477] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455477] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455477] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [180499] WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [180499] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [180499] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455581] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455581] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455581] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455574] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455574] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455574] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.654 UTC [427436] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.654 UTC [427436] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.654 UTC [427436] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.657 UTC [455341] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.657 UTC [455341] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.657 UTC [455341] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.661 UTC [455559] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.661 UTC [455559] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.661 UTC [455559] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.698 UTC [450818] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.698 UTC [450818] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.698 UTC [450818] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.773 UTC [560] LOG:  all server processes terminated; reinitializing
2022-03-12 05:27:56.957 UTC [455632] LOG:  database system was interrupted; last known up at 2022-03-12 05:25:19 UTC
2022-03-12 05:27:56.957 UTC [455634] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.957 UTC [455633] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.958 UTC [455636] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.958 UTC [455635] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.966 UTC [455637] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.966 UTC [455638] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.966 UTC [455639] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.969 UTC [455640] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.977 UTC [455641] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:57.331 UTC [455632] LOG:  database system was not properly shut down; automatic recovery in progress
2022-03-12 05:27:57.334 UTC [455632] LOG:  redo starts at E/B99AC028
2022-03-12 05:27:57.371 UTC [455632] LOG:  invalid record length at E/BA16D680: wanted 24, got 0
2022-03-12 05:27:57.371 UTC [455632] LOG:  redo done at E/BA16D640
2022-03-12 05:27:57.467 UTC [560] LOG:  database system is ready to accept connections

I supposed the most important lines are:

2022-03-12 05:27:56.647 UTC [560] LOG:  checkpointer process (PID 180496) was terminated by signal 9: Killed
2022-03-12 05:27:56.650 UTC [560] LOG:  terminating any other active server processes

Forum activity:


300000 posts to rebake (but the error can happens after 150000 or after 1000 rebaked posts, it’s kind of random.

Server’s specs:
CPX21 from Hetzner: https://www.hetzner.com/cloud-fr

  • 3 vCPU
  • 4 GB RAM
  • 80 GB disk (50% free)

Also, I found that dmesg command can shows additional info, here’s the end of this command:

[590461.105649] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=2f5e8cd98980f017bab02228d6af1cfbbd0068935aedea9ab3680470befa2030,mems_allowed=0,global_oom,task_memcg=/docker/2f5e8cd98980f017bab02228d6af1cfbbd0068935aedea9ab3680470befa2030,task=postmaster,pid=1135915,uid=105
[590461.105673] Out of memory: Killed process 1135915 (postmaster) total-vm:1172132kB, anon-rss:7244kB, file-rss:0kB, shmem-rss:839588kB, UID:105 pgtables:2104kB oom_score_adj:0
[590461.109673] oom_reaper: reaped process 1135915 (postmaster), now anon-rss:0kB, file-rss:0kB, shmem-rss:839588kB
[592120.454582] Process accounting resumed
[592121.145664] Process accounting resumed

Is this of any help?

1 curtida

Isso.
Quando seu servidor fica sem (memória virtual), o sistema operacional começa a encerrar processos para garantir que o sistema não fique totalmente indisponível.

Como isso só está acontecendo durante um rebake, você pode não precisar de mais memória real, apenas de mais memória virtual, o que pode ser conseguido aumentando seu arquivo de swap.

4 curtidas

Certo!

Fico um pouco surpreso por ficar sem memória com o rebake, pelo fato de que isso acontece em “estágios” aleatórios do rebake, que parece que libera memória entre cada lote de rebake, e os lotes de rebake padrão são bem pequenos por padrão (algo como 150 posts?).


Criei um arquivo de swap de 2 GB seguindo este guia. Espero que isso seja suficiente. :slight_smile:

1 curtida

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.