Rebake si interrompe con messaggio di errore

Ciao!
Quando provo a rifare i post con un rake task, a volte si interrompe con questo messaggio:

168858 / 329447 ( 51.3%)WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command.

E alla fine di un muro di testo, contenente cose come:

/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:828:in `exec_queries'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:631:in `load'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:249:in `records'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation/delegation.rb:88:in `each'
/var/www/discourse/lib/tasks/posts.rake:124:in `block in rebake_posts'

C’è questo:
168859 / 329447 ( 51.3%)rake aborted! ActiveRecord::ConnectionNotEstablished: connection to server on socket \"/var/run/postgresql/.s.PGSQL.5432\" failed: FATAL: the database system is in recovery mode


Errore nei log di Discourse:

Qualche idea su come risolvere questo problema?

Dovresti controllare i tuoi log di postgresql.

1 Mi Piace

Thanks!

I admit that this PostgreSQL is a whole new universe to me and I’m not familiar with it at all.
Can you confirm that logs are stored in /var/discourse/shared/standalone/log/var-log/postgres or should I look for a specific error log somewhere?
I looked at the current file in this folder, but couldn’t find anything suspicious, especially at the time shown by Discourse’s interface logs.

But it also is very possible that I don’t know what to look for.


edit : found this in the current file:

	
2022-03-12 05:27:51.651 UTC [450818] discourse@discourse LOG:  duration: 258.470 ms  parse <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:52.287 UTC [450818] discourse@discourse LOG:  duration: 557.420 ms  bind <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:52.500 UTC [450818] discourse@discourse LOG:  duration: 158.213 ms  execute <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:53.047 UTC [450818] discourse@discourse LOG:  duration: 301.290 ms  parse <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:56.647 UTC [560] LOG:  checkpointer process (PID 180496) was terminated by signal 9: Killed
2022-03-12 05:27:56.650 UTC [560] LOG:  terminating any other active server processes
2022-03-12 05:27:56.652 UTC [455579] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.652 UTC [455579] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.652 UTC [455579] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.652 UTC [455580] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.652 UTC [455580] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.652 UTC [455580] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455573] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455573] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455573] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455560] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455560] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455560] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455476] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455476] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455476] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [180506] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [180506] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [180506] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455477] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455477] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455477] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [180499] WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [180499] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [180499] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455581] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455581] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455581] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455574] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455574] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455574] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.654 UTC [427436] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.654 UTC [427436] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.654 UTC [427436] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.657 UTC [455341] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.657 UTC [455341] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.657 UTC [455341] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.661 UTC [455559] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.661 UTC [455559] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.661 UTC [455559] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.698 UTC [450818] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.698 UTC [450818] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.698 UTC [450818] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.773 UTC [560] LOG:  all server processes terminated; reinitializing
2022-03-12 05:27:56.957 UTC [455632] LOG:  database system was interrupted; last known up at 2022-03-12 05:25:19 UTC
2022-03-12 05:27:56.957 UTC [455634] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.957 UTC [455633] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.958 UTC [455636] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.958 UTC [455635] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.966 UTC [455637] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.966 UTC [455638] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.966 UTC [455639] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.969 UTC [455640] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.977 UTC [455641] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:57.331 UTC [455632] LOG:  database system was not properly shut down; automatic recovery in progress
2022-03-12 05:27:57.334 UTC [455632] LOG:  redo starts at E/B99AC028
2022-03-12 05:27:57.371 UTC [455632] LOG:  invalid record length at E/BA16D680: wanted 24, got 0
2022-03-12 05:27:57.371 UTC [455632] LOG:  redo done at E/BA16D640
2022-03-12 05:27:57.467 UTC [560] LOG:  database system is ready to accept connections

I supposed the most important lines are:

2022-03-12 05:27:56.647 UTC [560] LOG:  checkpointer process (PID 180496) was terminated by signal 9: Killed
2022-03-12 05:27:56.650 UTC [560] LOG:  terminating any other active server processes

Forum activity:


300000 posts to rebake (but the error can happens after 150000 or after 1000 rebaked posts, it’s kind of random.

Server’s specs:
CPX21 from Hetzner: https://www.hetzner.com/cloud-fr

  • 3 vCPU
  • 4 GB RAM
  • 80 GB disk (50% free)

Also, I found that dmesg command can shows additional info, here’s the end of this command:

[590461.105649] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=2f5e8cd98980f017bab02228d6af1cfbbd0068935aedea9ab3680470befa2030,mems_allowed=0,global_oom,task_memcg=/docker/2f5e8cd98980f017bab02228d6af1cfbbd0068935aedea9ab3680470befa2030,task=postmaster,pid=1135915,uid=105
[590461.105673] Out of memory: Killed process 1135915 (postmaster) total-vm:1172132kB, anon-rss:7244kB, file-rss:0kB, shmem-rss:839588kB, UID:105 pgtables:2104kB oom_score_adj:0
[590461.109673] oom_reaper: reaped process 1135915 (postmaster), now anon-rss:0kB, file-rss:0kB, shmem-rss:839588kB
[592120.454582] Process accounting resumed
[592121.145664] Process accounting resumed

Is this of any help?

1 Mi Piace

Questo.
Quando il tuo server esaurisce la memoria (virtuale), il sistema operativo inizia a terminare i processi per garantire che il sistema non diventi completamente inutilizzabile.

Poiché ciò accade solo durante un rebake, potresti non aver bisogno di più memoria effettiva, ma solo di più memoria virtuale, che può essere ottenuta ampliando il tuo file di swap.

4 Mi Piace

Va bene!

Sono un po’ sorpreso di avere esaurito la memoria con il rebake, dal fatto che accada in “fasi” casuali del rebake, che sembra liberare memoria tra un batch di rebake e l’altro, e che i batch di rebake predefiniti siano piuttosto piccoli per impostazione predefinita (qualcosa come 150 post?).


Ho creato un file di swap da 2 GB seguendo questa guida. Spero che vada bene. :slight_smile:

1 Mi Piace

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.