`rake posts:rebake` が途中でクラッシュしました

Hi there, I ran a rake posts:rebake on my testbed site, which would normally take over 24h to finish. This morning I logged into the session and found that it had crashed about 50% of the way through. I don’t think it was out of memory because the VPS has 8GB of RAM and 15GB of swap. I also had my customized Drupal importer script crash one time several days into the import process due to a Postgres error. I have successfully run the same import script multiple times, and at least one full rake posts:rebake too. I chalked it up to a fluke that time, but this time it appears to again be another random issue with Postgres:

Rebaking post markdown for 'default'                                                                                                                                                          
  1328000 / 2625793 ( 50.6%)rake aborted!                                                                                                                                                     
ActiveRecord::StatementInvalid: PG::DataCorrupted: ERROR:  invalid page in block 181250 of relation base/16384/16846                                                                          
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'                                                                            
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'                                                                            
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:768:in `block (2 levels) in exec_no_cache'                  
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'                                                     
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/dependencies/interlock.rb:41:in `permit_concurrent_loads'                                           
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:767:in `block in exec_no_cache'                             
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'                                
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'                            
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'                                
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'                                     
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:765:in `block in log'                                         
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'                                                    
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:756:in `log'                                                  
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:766:in `exec_no_cache'                                      
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:745:in `execute_and_clear'                                  
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:54:in `exec_query'                              
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:560:in `select'                                   
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:66:in `select_all'                                
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/query_cache.rb:110:in `select_all'                                       
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/querying.rb:54:in `_query_by_sql'                                                                     
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:942:in `block in exec_main_query'                                                         
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary'                                                    
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:928:in `exec_main_query'                                                                  
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:914:in `block in exec_queries'                                                            
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary' 
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:908:in `exec_queries'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:695:in `load'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:250:in `records'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation/delegation.rb:88:in `each'
/var/www/discourse/lib/tasks/posts.rake:128:in `block in rebake_posts'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/lib/tasks/posts.rake:123:in `rebake_posts'
/var/www/discourse/lib/tasks/posts.rake:108:in `block in rebake_posts_all_sites'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:80:in `with_connection'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:90:in `each_connection'
/var/www/discourse/lib/tasks/posts.rake:108:in `rebake_posts_all_sites'
/var/www/discourse/lib/tasks/posts.rake:7:in `block in <main>'
/usr/local/bin/bundle:25:in `load'
/usr/local/bin/bundle:25:in `<main>'
Caused by:
PG::DataCorrupted: ERROR:  invalid page in block 181250 of relation base/16384/16846
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:768:in `block (2 levels) in exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/dependencies/interlock.rb:41:in `permit_concurrent_loads'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:767:in `block in exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:765:in `block in log'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:756:in `log'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:766:in `exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:745:in `execute_and_clear'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:54:in `exec_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:560:in `select'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:66:in `select_all'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/query_cache.rb:110:in `select_all'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/querying.rb:54:in `_query_by_sql'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:942:in `block in exec_main_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:928:in `exec_main_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:914:in `block in exec_queries'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:908:in `exec_queries'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:695:in `load'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:250:in `records'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation/delegation.rb:88:in `each'
/var/www/discourse/lib/tasks/posts.rake:128:in `block in rebake_posts'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/lib/tasks/posts.rake:123:in `rebake_posts'
/var/www/discourse/lib/tasks/posts.rake:108:in `block in rebake_posts_all_sites'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:80:in `with_connection'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:90:in `each_connection'
/var/www/discourse/lib/tasks/posts.rake:108:in `rebake_posts_all_sites'
/var/www/discourse/lib/tasks/posts.rake:7:in `block in <main>'
/usr/local/bin/bundle:25:in `load'
/usr/local/bin/bundle:25:in `<main>'
Tasks: TOP => posts:rebake
(See full trace by running task with --trace)

こんにちは!

ディスク上のPostgresの状態が悪化しているようです。これは、特定の破損イベント、またはVPSのファイルシステム破損や基盤となるディスク/メモリハードウェア障害のような継続的な問題が原因である可能性があります。

まず、ファイルシステムでfsckを試してください。

次に、テスト環境であり、データを再構築できる場合は、PGのデータディレクトリを完全に削除してDBを新規作成し、最初からやり直してみてください。その後、問題を再度インポート/リベイクして、問題が持続するかどうかを確認することで、負荷をかけてみてください。

「いいね!」 3

@leonardo、ヒントをありがとうございます。

考えてみると、先週インポート中に発生した前回のクラッシュは、Postgres の重複キーエラーが原因だったことを思い出しました。今回は異なるエラーです。

xfs_repair-e オプション付きで実行しましたが、理解したところによると、破損はないようです。

xfs_repair -e /dev/sda2
フェーズ 1 - スーパーブロックの検索と検証...
フェーズ 2 - 内部ログを使用...
        - ログをゼロにする...
        - ファイルシステム空き容量と inode マップをスキャン...
        - ルート inode チャンクが見つかりました
フェーズ 3 - 各 AG について...
        - agi 未リンクリストをスキャンしてクリア...
        - 既知の inode を処理し、inode の検出を実行...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - 新しく検出された inode を処理...
フェーズ 4 - 重複ブロックのチェック...
        - 重複エクステントリストの設定...
        - 重複ブロックを主張する inode のチェック...
        - agno = 0
        - agno = 3
        - agno = 1
        - agno = 2
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
可能な場合に inode の reflink フラグをクリア
フェーズ 5 - AG ヘッダーとツリーを再構築...
        - スーパーブロックをリセット...
フェーズ 6 - inode の接続性をチェック...
        - リアルタイムビットマップとサマリー inode のコンテンツをリセット
        - ファイルシステムをトラバース...
        - トラバース完了...
        - 切断された inode を lost+found に移動...
フェーズ 7 - リンクカウントを検証および修正...
完了

編集: 再起動後(マウントされていないファイルシステムで fsck を実行するためにレスキューシステムを使用しました)、Discourse アプリは起動し、ログにエラーはありませんでしたが、白い画面の死しか表示されませんでした。ウェブサイトを再度読み込むために、アプリの再構築が必要でした。これらすべてで何がうまくいかなかったのか、本当にわかりません。