`rake posts:rebake` è crashato a metà

Hi there, I ran a rake posts:rebake on my testbed site, which would normally take over 24h to finish. This morning I logged into the session and found that it had crashed about 50% of the way through. I don’t think it was out of memory because the VPS has 8GB of RAM and 15GB of swap. I also had my customized Drupal importer script crash one time several days into the import process due to a Postgres error. I have successfully run the same import script multiple times, and at least one full rake posts:rebake too. I chalked it up to a fluke that time, but this time it appears to again be another random issue with Postgres:

Rebaking post markdown for 'default'                                                                                                                                                          
  1328000 / 2625793 ( 50.6%)rake aborted!                                                                                                                                                     
ActiveRecord::StatementInvalid: PG::DataCorrupted: ERROR:  invalid page in block 181250 of relation base/16384/16846                                                                          
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'                                                                            
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'                                                                            
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:768:in `block (2 levels) in exec_no_cache'                  
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'                                                     
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/dependencies/interlock.rb:41:in `permit_concurrent_loads'                                           
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:767:in `block in exec_no_cache'                             
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'                                
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'                            
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'                                
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'                                     
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:765:in `block in log'                                         
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'                                                    
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:756:in `log'                                                  
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:766:in `exec_no_cache'                                      
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:745:in `execute_and_clear'                                  
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:54:in `exec_query'                              
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:560:in `select'                                   
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:66:in `select_all'                                
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/query_cache.rb:110:in `select_all'                                       
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/querying.rb:54:in `_query_by_sql'                                                                     
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:942:in `block in exec_main_query'                                                         
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary'                                                    
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:928:in `exec_main_query'                                                                  
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:914:in `block in exec_queries'                                                            
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary' 
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:908:in `exec_queries'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:695:in `load'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:250:in `records'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation/delegation.rb:88:in `each'
/var/www/discourse/lib/tasks/posts.rake:128:in `block in rebake_posts'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/lib/tasks/posts.rake:123:in `rebake_posts'
/var/www/discourse/lib/tasks/posts.rake:108:in `block in rebake_posts_all_sites'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:80:in `with_connection'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:90:in `each_connection'
/var/www/discourse/lib/tasks/posts.rake:108:in `rebake_posts_all_sites'
/var/www/discourse/lib/tasks/posts.rake:7:in `block in <main>'
/usr/local/bin/bundle:25:in `load'
/usr/local/bin/bundle:25:in `<main>'
Caused by:
PG::DataCorrupted: ERROR:  invalid page in block 181250 of relation base/16384/16846
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:768:in `block (2 levels) in exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/dependencies/interlock.rb:41:in `permit_concurrent_loads'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:767:in `block in exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:765:in `block in log'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:756:in `log'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:766:in `exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:745:in `execute_and_clear'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:54:in `exec_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:560:in `select'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:66:in `select_all'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/query_cache.rb:110:in `select_all'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/querying.rb:54:in `_query_by_sql'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:942:in `block in exec_main_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:928:in `exec_main_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:914:in `block in exec_queries'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:908:in `exec_queries'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:695:in `load'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:250:in `records'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation/delegation.rb:88:in `each'
/var/www/discourse/lib/tasks/posts.rake:128:in `block in rebake_posts'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/lib/tasks/posts.rake:123:in `rebake_posts'
/var/www/discourse/lib/tasks/posts.rake:108:in `block in rebake_posts_all_sites'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:80:in `with_connection'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:90:in `each_connection'
/var/www/discourse/lib/tasks/posts.rake:108:in `rebake_posts_all_sites'
/var/www/discourse/lib/tasks/posts.rake:7:in `block in <main>'
/usr/local/bin/bundle:25:in `load'
/usr/local/bin/bundle:25:in `<main>'
Tasks: TOP => posts:rebake
(See full trace by running task with --trace)

Ciao!

Sembra che lo stato di Postgres sul disco sia danneggiato, sia a causa di un evento di corruzione specifico o di un problema in corso, come la corruzione del filesystem o guasti hardware sottostanti del disco/memoria nel tuo VPS.

Per cominciare, dovresti provare fsck sul tuo filesystem.

Successivamente, se si tratta di un ambiente di test e puoi ricostruire i dati, prova a ricominciare da capo con PG rimuovendo completamente la sua directory dei dati e creando un nuovo database. Quindi prova a stressare le cose importando/rielaborando di nuovo per vedere se il problema persiste.

3 Mi Piace

Grazie mille @leonardo per i suggerimenti.

Ripensandoci, ricordo che il precedente crash che ho avuto la settimana scorsa durante l’importazione era specificamente dovuto a un errore di chiave duplicata di Postgres, quindi questa volta è un errore diverso.

Ho eseguito un xfs_repair con l’opzione -e, quindi da quello che capisco non sembra esserci alcuna corruzione:

xfs_repair -e /dev/sda2
Fase 1 - ricerca e verifica superblock...
Fase 2 - utilizzo del log interno
        - azzeramento log...
        - scansione dello spazio libero del filesystem e delle mappe degli inode...
        - trovato blocco inode radice
Fase 3 - per ogni AG...
        - scansione e pulizia delle liste di agi non collegate...
        - elaborazione degli inode noti ed esecuzione della scoperta degli inode...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - elaborazione degli inode scoperti di recente...
Fase 4 - verifica dei blocchi duplicati...
        - impostazione dell'elenco dei blocchi duplicati...
        - verifica degli inode che rivendicano blocchi duplicati...
        - agno = 0
        - agno = 3
        - agno = 1
        - agno = 2
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
rimozione del flag reflink sugli inode quando possibile
Fase 5 - ricostruzione delle intestazioni e degli alberi AG...
        - reimpostazione del superblock...
Fase 6 - verifica della connettività degli inode...
        - reimpostazione dei contenuti della bitmap in tempo reale e degli inode di riepilogo
        - attraversamento del filesystem ...
        - attraversamento terminato ...
        - spostamento degli inode disconnessi in lost+found ...
Fase 7 - verifica e correzione dei conteggi dei collegamenti...
fatto

MODIFICA: Dopo il riavvio (ho usato il sistema di soccorso per eseguire fsck sul filesystem smontato) l’app Discourse si avviava e non c’erano errori nei log, ma ho solo ottenuto la schermata bianca della morte. Ho dovuto ricostruire l’app per far caricare di nuovo il sito web. Non sono proprio sicuro di cosa sia andato storto con tutto questo.