好的,这说得通
。
只要新增的插件默认不启用,至少对现有实例来说是这样,那就一切都好。但关于 gem 依赖项的更改,以及我们面临的多次数据库迁移失败,我有点担心可能会有东西无形中损坏或丢失。遗憾的是,重建在这方面没有任何日志记录,只是说 cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate' 以错误代码退出,但没有详细信息,日志中也没有
。
好的,这说得通
。
只要新增的插件默认不启用,至少对现有实例来说是这样,那就一切都好。但关于 gem 依赖项的更改,以及我们面临的多次数据库迁移失败,我有点担心可能会有东西无形中损坏或丢失。遗憾的是,重建在这方面没有任何日志记录,只是说 cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate' 以错误代码退出,但没有详细信息,日志中也没有
。
这听起来很奇怪,您是在单独的容器中运行 PG 吗?它是否安装了 pgvector?
通常,当出现问题时,您可以向上滚动一点找到错误,由于排序原因,错误通常在 100 行以上。
仅独立的 Discourse Docker 容器。另外,我之前说得不完全对,确实有更多关于它的输出,但没有我能理解的:
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.2/lib/active_record/migration.rb:1454:in `migrate' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.2/lib/active_record/migration.rb:1261:in `up’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.2/lib/active_record/migration.rb:1236:in `migrate' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.2/lib/active_record/tasks/database_tasks.rb:270:in `migrate’
/var/www/discourse/lib/tasks/db.rake:267:in `block (2 levels) in <main>’ /var/www/discourse/lib/distributed_mutex.rb:53:in `block in synchronize’
/var/www/discourse/lib/distributed_mutex.rb:49:in `synchronize' /var/www/discourse/lib/distributed_mutex.rb:49:in `synchronize’
/var/www/discourse/lib/distributed_mutex.rb:34:in `synchronize' /var/www/discourse/lib/tasks/db.rake:242:in `block in ’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/rake-13.3.0/exe/rake:27:in `<top (required)>' /usr/local/bin/bundle:25:in `load’
/usr/local/bin/bundle:25:in `’
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
I, [2025-07-29T20:36:15.074727 #1] INFO – : == 20180828095129 PushFixTopicEmbedAuthorsJob: migrating ==========
== 20180828095129 PushFixTopicEmbedAuthorsJob: migrated (0.0021s) =============
I, [2025-07-29T20:36:15.075331 #1] INFO – : Terminating async processes
I, [2025-07-29T20:36:15.075355 #1] INFO – : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/15/bin/postmaster -D /etc/postgresql/15/main pid: 45
I, [2025-07-29T20:36:15.075376 #1] INFO – : Sending TERM to exec chpst -u redis -U redis /usr/bin/redis-server /etc/redis/redis.conf pid: 112
2025-07-29 20:36:15.075 UTC [45] LOG: received fast shutdown request
112:signal-handler (1753821375) Received SIGTERM scheduling shutdown…
2025-07-29 20:36:15.091 UTC [45] LOG: aborting any active transactions
2025-07-29 20:36:15.092 UTC [45] LOG: background worker “logical replication launcher” (PID 59) exited with exit code 1
2025-07-29 20:36:15.092 UTC [54] LOG: shutting down
2025-07-29 20:36:15.105 UTC [54] LOG: checkpoint starting: shutdown immediate
112:M 29 Jul 2025 20:36:15.125 # User requested shutdown…
112:M 29 Jul 2025 20:36:15.125 * Saving the final RDB snapshot before exiting.
112:M 29 Jul 2025 20:36:15.247 * DB saved on disk
112:M 29 Jul 2025 20:36:15.247 # Redis is now ready to exit, bye bye…
2025-07-29 20:36:15.273 UTC [54] LOG: checkpoint complete: wrote 10 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.066 s, sync=0.046 s, total=0.181 s; sync files=9, longest=0.026 s, average=0.006 s; distance=36 kB, estimate=36 kB
2025-07-29 20:36:15.276 UTC [45] LOG: database system is shut down
## FAILED
Pups::ExecError: cd /var/www/discourse && su discourse -c ‘bundle exec rake db:migrate’ failed with return #<Process::Status: pid 635 exit 1>
Location of failure: /usr/local/lib/ruby/gems/3.3.0/gems/pups-1.3.0/lib/pups/exec_command.rb:131:in `spawn’
`exec failed with the params {“cd”=>“$home”, “tag”=>“migrate”, “hook”=>“db_migrate”, “cmd”=>[“su discourse -c ‘bundle exec rake db:migrate’”]}’
`bootstrap failed with exit code 1`
`** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.`
`./discourse-doctor may help diagnose the problem.`
顺便说一下,这里的代码块和围栏有问题。手动添加反引号或在编辑器中选择多行并点击代码按钮都不起作用。它将未选择的部分也变成了代码。
有趣的是,这是迁移
它遵循了一个我们真的不应该遵循的模式。
会找人来看看!
所以它尝试了一个之前不存在且未启用的插件的迁移?您知道为什么在多次重建尝试后它会成功吗?
现在一切都正常了吗?
我对它为什么会失败感到有些困惑,但我看到了问题所在,我们正在从一个非常脆弱的迁移中调用应用程序代码。
是的,正如所说,在多次重建尝试后,它终于成功了。在最后一个成功的日志中,我根本没有看到这个特定的迁移被执行。它可能在失败时被标记为“已迁移”,因此在下次重建时不再尝试?
迁移刚刚在 Redis 上设置了一个键,它失败的一个可能原因是 RSS 插件的代码在当时没有被加载,我们将对此进行追踪,以免其他人遇到此问题。
将在此处更正根本问题。