`rake posts:rebake` 中途崩溃

你好,我在测试站点上运行了 rake posts:rebake,该操作通常需要超过 24 小时才能完成。今天早上我登录会话时发现,它在进度约 50% 时崩溃了。我认为这不是内存不足的问题,因为 VPS 拥有 8GB 内存和 15GB 交换空间。此外,几天前我的自定义 Drupal 导入脚本在导入过程中也因 PostgreSQL 错误崩溃过一次。我之前已成功多次运行相同的导入脚本,并且至少完整运行过一次 rake posts:rebake。当时我将那次崩溃归咎于偶然现象,但这次看来又是另一个随机的 PostgreSQL 问题:

Rebaking post markdown for 'default'
  1328000 / 2625793 ( 50.6%)rake aborted!
ActiveRecord::StatementInvalid: PG::DataCorrupted: ERROR:  invalid page in block 181250 of relation base/16384/16846
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:768:in `block (2 levels) in exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/dependencies/interlock.rb:41:in `permit_concurrent_loads'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:767:in `block in exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:765:in `block in log'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:756:in `log'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:766:in `exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:745:in `execute_and_clear'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:54:in `exec_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:560:in `select'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:66:in `select_all'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/query_cache.rb:110:in `select_all'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/querying.rb:54:in `_query_by_sql'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:942:in `block in exec_main_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:928:in `exec_main_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:914:in `block in exec_queries'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:908:in `exec_queries'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:695:in `load'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:250:in `records'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation/delegation.rb:88:in `each'
/var/www/discourse/lib/tasks/posts.rake:128:in `block in rebake_posts'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/lib/tasks/posts.rake:123:in `rebake_posts'
/var/www/discourse/lib/tasks/posts.rake:108:in `block in rebake_posts_all_sites'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:80:in `with_connection'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:90:in `each_connection'
/var/www/discourse/lib/tasks/posts.rake:108:in `rebake_posts_all_sites'
/var/www/discourse/lib/tasks/posts.rake:7:in `block in <main>'
/usr/local/bin/bundle:25:in `load'
/usr/local/bin/bundle:25:in `<main>'
Caused by:
PG::DataCorrupted: ERROR:  invalid page in block 181250 of relation base/16384/16846
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:69:in `exec_params'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:768:in `block (2 levels) in exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/dependencies/interlock.rb:41:in `permit_concurrent_loads'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:767:in `block in exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:765:in `block in log'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:756:in `log'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:766:in `exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:745:in `execute_and_clear'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:54:in `exec_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:560:in `select'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:66:in `select_all'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/connection_adapters/abstract/query_cache.rb:110:in `select_all'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/querying.rb:54:in `_query_by_sql'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:942:in `block in exec_main_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:928:in `exec_main_query'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:914:in `block in exec_queries'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:962:in `skip_query_cache_if_necessary'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:908:in `exec_queries'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:695:in `load'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation.rb:250:in `records'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activerecord-7.0.4.1/lib/active_record/relation/delegation.rb:88:in `each'
/var/www/discourse/lib/tasks/posts.rake:128:in `block in rebake_posts'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4.1/lib/active_support/core_ext/range/each.rb:14:in `step'
/var/www/discourse/lib/tasks/posts.rake:123:in `rebake_posts'
/var/www/discourse/lib/tasks/posts.rake:108:in `block in rebake_posts_all_sites'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:80:in `with_connection'
/var/www/discourse/vendor/bundle/ruby/3.1.0/gems/rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:90:in `each_connection'
/var/www/discourse/lib/tasks/posts.rake:108:in `rebake_posts_all_sites'
/var/www/discourse/lib/tasks/posts.rake:7:in `block in <main>'
/usr/local/bin/bundle:25:in `load'
/usr/local/bin/bundle:25:in `<main>'
Tasks: TOP => posts:rebake
(See full trace by running task with --trace)

您好!

看起来您的 Postgres 磁盘状态已损坏,这可能是由于特定的损坏事件,也可能是由于持续存在的问题,例如文件系统损坏或 VPS 的底层磁盘/内存硬件故障。

首先,您应该在文件系统上尝试 fsck

接下来,如果这是一个测试环境,并且您可以重建数据,请尝试通过完全删除其数据目录并创建新的数据库来从头开始重新启动 PG。然后,尝试通过再次导入/重新烘焙来给系统施加压力,看看问题是否仍然存在。

非常感谢 @leonardo 的建议。

现在回想起来,我记得上周导入时遇到的上次崩溃就是因为 Postgres 数据库的重复键错误,所以这次是不同的错误。

我运行了 xfs_repair 命令,并使用了 -e 选项,所以根据我的理解,似乎没有损坏:

xfs_repair -e /dev/sda2 
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 1
        - agno = 2
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
clearing reflink flag on inodes when possible
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

编辑: 重启后(我使用救援系统在未挂载的文件系统上运行了 fsck),Discourse 应用可以启动,日志中也没有错误,但我只看到了白屏死机。我不得不重建应用才能让网站重新加载。真的不确定这一切到底出了什么问题。