Rebake bricht mit Fehlermeldung ab

Hallo!
Wenn ich versuche, Beiträge mit einem Rake-Task erneut zu backen, stoppt es manchmal mit dieser Meldung:

168858 / 329447 ( 51,3%)WARNUNG: Verbindung wird beendet, da ein anderer Serverprozess abgestürzt ist DETAIL: Der Postmaster hat diesen Serverprozess angewiesen, die aktuelle Transaktion zurückzurollen und zu beenden, da ein anderer Serverprozess abnormal beendet wurde und möglicherweise den gemeinsamen Speicher beschädigt hat. HINWEIS: In einem Moment sollten Sie in der Lage sein, sich erneut mit der Datenbank zu verbinden und Ihren Befehl zu wiederholen.

Und am Ende einer Textwand, die Dinge wie diese enthält:

/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:828:in `exec_queries'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:631:in `load'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation.rb:249:in `records'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.7/lib/active_record/relation/delegation.rb:88:in `each'
/var/www/discourse/lib/tasks/posts.rake:124:in `block in rebake_posts'

Gibt es das:
168859 / 329447 ( 51,3%)rake abgebrochen! ActiveRecord::ConnectionNotEstablished: Verbindung zum Server über Socket „/var/run/postgresql/.s.PGSQL.5432“ fehlgeschlagen: FATAL: Das Datenbanksystem befindet sich im Wiederherstellungsmodus


Fehler in den Discourse-Protokollen:

Irgendeine Idee, wie man das beheben kann?

Sie sollten Ihre PostgreSQL-Protokolle überprüfen.

1 „Gefällt mir“

Thanks!

I admit that this PostgreSQL is a whole new universe to me and I’m not familiar with it at all.
Can you confirm that logs are stored in /var/discourse/shared/standalone/log/var-log/postgres or should I look for a specific error log somewhere?
I looked at the current file in this folder, but couldn’t find anything suspicious, especially at the time shown by Discourse’s interface logs.

But it also is very possible that I don’t know what to look for.


edit : found this in the current file:

	
2022-03-12 05:27:51.651 UTC [450818] discourse@discourse LOG:  duration: 258.470 ms  parse <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:52.287 UTC [450818] discourse@discourse LOG:  duration: 557.420 ms  bind <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:52.500 UTC [450818] discourse@discourse LOG:  duration: 158.213 ms  execute <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:53.047 UTC [450818] discourse@discourse LOG:  duration: 301.290 ms  parse <unnamed>: SELECT 1 AS one FROM "permalinks" WHERE "permalinks"."url" = 'images/emoji/twitter/roll_eyes.png' LIMIT 1
2022-03-12 05:27:56.647 UTC [560] LOG:  checkpointer process (PID 180496) was terminated by signal 9: Killed
2022-03-12 05:27:56.650 UTC [560] LOG:  terminating any other active server processes
2022-03-12 05:27:56.652 UTC [455579] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.652 UTC [455579] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.652 UTC [455579] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.652 UTC [455580] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.652 UTC [455580] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.652 UTC [455580] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455573] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455573] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455573] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455560] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455560] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455560] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455476] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455476] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455476] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [180506] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [180506] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [180506] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455477] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455477] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455477] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [180499] WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [180499] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [180499] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455581] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455581] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455581] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.653 UTC [455574] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.653 UTC [455574] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.653 UTC [455574] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.654 UTC [427436] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.654 UTC [427436] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.654 UTC [427436] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.657 UTC [455341] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.657 UTC [455341] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.657 UTC [455341] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.661 UTC [455559] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.661 UTC [455559] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.661 UTC [455559] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.698 UTC [450818] discourse@discourse WARNING:  terminating connection because of crash of another server process
2022-03-12 05:27:56.698 UTC [450818] discourse@discourse DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-12 05:27:56.698 UTC [450818] discourse@discourse HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-03-12 05:27:56.773 UTC [560] LOG:  all server processes terminated; reinitializing
2022-03-12 05:27:56.957 UTC [455632] LOG:  database system was interrupted; last known up at 2022-03-12 05:25:19 UTC
2022-03-12 05:27:56.957 UTC [455634] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.957 UTC [455633] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.958 UTC [455636] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.958 UTC [455635] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.966 UTC [455637] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.966 UTC [455638] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.966 UTC [455639] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.969 UTC [455640] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:56.977 UTC [455641] discourse@discourse FATAL:  the database system is in recovery mode
2022-03-12 05:27:57.331 UTC [455632] LOG:  database system was not properly shut down; automatic recovery in progress
2022-03-12 05:27:57.334 UTC [455632] LOG:  redo starts at E/B99AC028
2022-03-12 05:27:57.371 UTC [455632] LOG:  invalid record length at E/BA16D680: wanted 24, got 0
2022-03-12 05:27:57.371 UTC [455632] LOG:  redo done at E/BA16D640
2022-03-12 05:27:57.467 UTC [560] LOG:  database system is ready to accept connections

I supposed the most important lines are:

2022-03-12 05:27:56.647 UTC [560] LOG:  checkpointer process (PID 180496) was terminated by signal 9: Killed
2022-03-12 05:27:56.650 UTC [560] LOG:  terminating any other active server processes

Forum activity:


300000 posts to rebake (but the error can happens after 150000 or after 1000 rebaked posts, it’s kind of random.

Server’s specs:
CPX21 from Hetzner: https://www.hetzner.com/cloud-fr

  • 3 vCPU
  • 4 GB RAM
  • 80 GB disk (50% free)

Also, I found that dmesg command can shows additional info, here’s the end of this command:

[590461.105649] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=2f5e8cd98980f017bab02228d6af1cfbbd0068935aedea9ab3680470befa2030,mems_allowed=0,global_oom,task_memcg=/docker/2f5e8cd98980f017bab02228d6af1cfbbd0068935aedea9ab3680470befa2030,task=postmaster,pid=1135915,uid=105
[590461.105673] Out of memory: Killed process 1135915 (postmaster) total-vm:1172132kB, anon-rss:7244kB, file-rss:0kB, shmem-rss:839588kB, UID:105 pgtables:2104kB oom_score_adj:0
[590461.109673] oom_reaper: reaped process 1135915 (postmaster), now anon-rss:0kB, file-rss:0kB, shmem-rss:839588kB
[592120.454582] Process accounting resumed
[592121.145664] Process accounting resumed

Is this of any help?

1 „Gefällt mir“

Das. Wenn Ihrem Server der (virtuelle) Arbeitsspeicher ausgeht, beginnt das Betriebssystem, Prozesse zu beenden, um sicherzustellen, dass das System nicht vollständig ausfällt. Da dies nur während eines erneuten Backens geschieht, benötigen Sie möglicherweise nicht mehr tatsächlichen Arbeitsspeicher, sondern nur mehr virtuellen Arbeitsspeicher, was durch die Vergrößerung Ihrer Swap-Datei erreicht werden kann.

4 „Gefällt mir“

Alles klar!

Ich bin jedoch etwas überrascht, dass ich beim erneuten Backen (rebake) den Arbeitsspeicher verliere. Das liegt daran, dass es in zufälligen „Phasen“ des erneuten Backens passiert, dass es so aussieht, als ob zwischen jedem erneuten Backen-Batch Speicher freigegeben wird, und die Standard-Rebake-Batches sind standardmäßig ziemlich klein (etwa 150 Beiträge?).


Ich habe eine 2-GB-Swap-Datei gemäß dieser Anleitung erstellt. Ich hoffe, das ist in Ordnung. :slight_smile:

1 „Gefällt mir“

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.