网站重建后离线(2025年2月4日)》

在最近一次重建后,我看到了这个消息。然后我运行了 ./launcher rebuild app,但之后我的实例就无法访问了。这是一个标准安装——我该如何确定发生了什么?

运行 ./launcher logs app 时出现错误

cd /var/discourse
./launcher logs app
x86_64 arch detected.
run-parts: executing /etc/runit/1.d/00-ensure-links
run-parts: executing /etc/runit/1.d/00-fix-var-logs
run-parts: executing /etc/runit/1.d/01-cleanup-web-pids
run-parts: executing /etc/runit/1.d/anacron
run-parts: executing /etc/runit/1.d/cleanup-pids
Cleaning stale PID files
run-parts: executing /etc/runit/1.d/copy-env
run-parts: executing /etc/runit/1.d/letsencrypt
[Tue Feb  4 05:38:16 PM UTC 2025] Domains not changed.
[Tue Feb  4 05:38:16 PM UTC 2025] Skip, Next renewal time is: 2025-03-02T20:15:28Z
[Tue Feb  4 05:38:16 PM UTC 2025] Add '--force' to force to renew.
[Tue Feb  4 05:38:17 PM UTC 2025] Installing key to: /shared/ssl/mydomain.com.key
[Tue Feb  4 05:38:17 PM UTC 2025] Installing full chain to: /shared/ssl/mydomain.com.cer
[Tue Feb  4 05:38:17 PM UTC 2025] Run reload cmd: sv reload nginx
warning: nginx: unable to open supervise/ok: file does not exist
[Tue Feb  4 05:38:17 PM UTC 2025] Reload error for :
[Tue Feb  4 05:38:17 PM UTC 2025] Domains not changed.
[Tue Feb  4 05:38:17 PM UTC 2025] Skip, Next renewal time is: 2025-03-02T20:15:33Z
[Tue Feb  4 05:38:17 PM UTC 2025] Add '--force' to force to renew.
[Tue Feb  4 05:38:18 PM UTC 2025] Installing key to: /shared/ssl/mydomain.com_ecc.key
[Tue Feb  4 05:38:18 PM UTC 2025] Installing full chain to: /shared/ssl/mydomain.com_ecc.cer
[Tue Feb  4 05:38:18 PM UTC 2025] Run reload cmd: sv reload nginx
warning: nginx: unable to open supervise/ok: file does not exist
[Tue Feb  4 05:38:18 PM UTC 2025] Reload error for :
Started runsvdir, PID is 567
ok: run: redis: (pid 577) 0s
ok: run: postgres: (pid 581) 0s
nginx: [warn] duplicate extension "wasm", content type: "application/wasm", previous content type: "application/wasm" in /etc/nginx/conf.d/discourse.conf:4
supervisor pid: 575 unicorn pid: 607
Shutting Down
run-parts: executing /etc/runit/3.d/01-nginx
ok: down: nginx: 1s, normally up
run-parts: executing /etc/runit/3.d/02-unicorn
(575) exiting
ok: down: unicorn: 0s, normally up
run-parts: executing /etc/runit/3.d/10-redis
ok: down: redis: 1s, normally up
run-parts: executing /etc/runit/3.d/99-postgres
ok: down: postgres: 0s, normally up
ok: down: nginx: 5s, normally up
ok: down: postgres: 1s, normally up
ok: down: redis: 3s, normally up
ok: down: cron: 0s, normally up
ok: down: unicorn: 4s, normally up
ok: down: rsyslog: 0s, normally up
run-parts: executing /etc/runit/1.d/00-ensure-links
run-parts: executing /etc/runit/1.d/00-fix-var-logs
run-parts: executing /etc/runit/1.d/01-cleanup-web-pids
run-parts: executing /etc/runit/1.d/anacron
run-parts: executing /etc/runit/1.d/cleanup-pids
Cleaning stale PID files
run-parts: executing /etc/runit/1.d/copy-env
run-parts: executing /etc/runit/1.d/letsencrypt
[Tue Feb  4 05:58:32 PM UTC 2025] Domains not changed.
[Tue Feb  4 05:58:32 PM UTC 2025] Skip, Next renewal time is: 2025-03-02T20:15:28Z
[Tue Feb  4 05:58:32 PM UTC 2025] Add '--force' to force to renew.
[Tue Feb  4 05:58:32 PM UTC 2025] Installing key to: /shared/ssl/mydomain.com.key
[Tue Feb  4 05:58:32 PM UTC 2025] Installing full chain to: /shared/ssl/mydomain.com.cer
[Tue Feb  4 05:58:32 PM UTC 2025] Run reload cmd: sv reload nginx
fail: nginx: runsv not running
[Tue Feb  4 05:58:32 PM UTC 2025] Reload error for :
[Tue Feb  4 05:58:32 PM UTC 2025] Domains not changed.
[Tue Feb  4 05:58:32 PM UTC 2025] Skip, Next renewal time is: 2025-03-02T20:15:33Z
[Tue Feb  4 05:58:32 PM UTC 2025] Add '--force' to force to renew.
[Tue Feb  4 05:58:32 PM UTC 2025] Installing key to: /shared/ssl/mydomain.com_ecc.key
[Tue Feb  4 05:58:32 PM UTC 2025] Installing full chain to: /shared/ssl/mydomain.com_ecc.cer
[Tue Feb  4 05:58:32 PM UTC 2025] Run reload cmd: sv reload nginx
fail: nginx: runsv not running
[Tue Feb  4 05:58:32 PM UTC 2025] Reload error for :
Started runsvdir, PID is 561
ok: run: redis: (pid 575) 0s
nginx: [warn] duplicate extension "wasm", content type: "application/wasm", previous content type: "application/wasm" in /etc/nginx/conf.d/discourse.conf:4
ok: run: postgres: (pid 580) 1s
supervisor pid: 570 unicorn pid: 601
Shutting Down
run-parts: executing /etc/runit/3.d/01-nginx
ok: down: nginx: 0s, normally up
run-parts: executing /etc/runit/3.d/02-unicorn
(570) exiting
ok: down: unicorn: 1s, normally up
run-parts: executing /etc/runit/3.d/10-redis
ok: down: redis: 0s, normally up
run-parts: executing /etc/runit/3.d/99-postgres
ok: down: postgres: 0s, normally up
ok: down: nginx: 3s, normally up
ok: down: postgres: 1s, normally up
ok: down: redis: 1s, normally up
ok: down: cron: 0s, normally up
ok: down: unicorn: 3s, normally up
ok: down: rsyslog: 0s, normally up
run-parts: executing /etc/runit/1.d/00-ensure-links
run-parts: executing /etc/runit/1.d/00-fix-var-logs
run-parts: executing /etc/runit/1.d/01-cleanup-web-pids
run-parts: executing /etc/runit/1.d/anacron
run-parts: executing /etc/runit/1.d/cleanup-pids
Cleaning stale PID files
run-parts: executing /etc/runit/1.d/copy-env
run-parts: executing /etc/runit/1.d/letsencrypt
[Tue Feb  4 06:01:07 PM UTC 2025] Domains not changed.
[Tue Feb  4 06:01:07 PM UTC 2025] Skip, Next renewal time is: 2025-03-02T20:15:28Z
[Tue Feb  4 06:01:07 PM UTC 2025] Add '--force' to force to renew.
[Tue Feb  4 06:01:07 PM UTC 2025] Installing key to: /shared/ssl/mydomain.com.key
[Tue Feb  4 06:01:07 PM UTC 2025] Installing full chain to: /shared/ssl/mydomain.com.cer
[Tue Feb  4 06:01:07 PM UTC 2025] Run reload cmd: sv reload nginx
fail: nginx: runsv not running
[Tue Feb  4 06:01:07 PM UTC 2025] Reload error for :
[Tue Feb  4 06:01:07 PM UTC 2025] Domains not changed.
[Tue Feb  4 06:01:07 PM UTC 2025] Skip, Next renewal time is: 2025-03-02T20:15:33Z
[Tue Feb  4 06:01:07 PM UTC 2025] Add '--force' to force to renew.
[Tue Feb  4 06:01:07 PM UTC 2025] Installing key to: /shared/ssl/mydomain.com_ecc.key
[Tue Feb  4 06:01:07 PM UTC 2025] Installing full chain to: /shared/ssl/mydomain.com_ecc.cer
[Tue Feb  4 06:01:07 PM UTC 2025] Run reload cmd: sv reload nginx
fail: nginx: runsv not running
[Tue Feb  4 06:01:07 PM UTC 2025] Reload error for :
Started runsvdir, PID is 561
ok: run: redis: (pid 575) 0s
ok: run: postgres: (pid 576) 0s
nginx: [warn] duplicate extension "wasm", content type: "application/wasm", previous content type: "application/wasm" in /etc/nginx/conf.d/discourse.conf:4
supervisor pid: 570 unicorn pid: 601
(570) exiting
nginx: [warn] duplicate extension "wasm", content type: "application/wasm", previous content type: "application/wasm" in /etc/nginx/conf.d/discourse.conf:4
3 个赞

一切都进行得很顺利。我看到了以下内容并重建了。构建完成没有错误,但是我的网站打不开。

-------------------------------------------------------------------------------------
POSTGRES升级完成

旧的13数据库存放在 /shared/postgres_data_old

要完成升级,请再次使用以下命令重建:

./launcher rebuild app
-------------------------------------------------------------------------------------

当我运行这个
tail /var/discourse/shared/standalone/log/var-log/postgres/current
输出

2025-02-04 18:11:50.943 UTC [573] LOG:  正在关闭
2025-02-04 18:11:50.945 UTC [573] LOG:  检查点开始:立即关闭
2025-02-04 18:11:50.970 UTC [573] LOG:  检查点完成:写入139个缓冲区(0.0%);新增0个WAL文件,移除0个,回收0个;写入时间=0.017秒,同步=0.005秒,总计=0.027秒;同步文件=27,最长=0.002秒,平均=0.001秒;距离=410 KB,估算=410 KB
2025-02-04 18:11:51.034 UTC [547] LOG:  数据库系统已关闭
2025-02-04 18:15:04.302 UTC [548] LOG:  启动 PostgreSQL 15.10 (Debian 15.10-1.pgdg120+1),在 x86_64-pc-linux-gnu 上,由 gcc (Debian 12.2.0-14) 编译,64位
2025-02-04 18:15:04.303 UTC [548] LOG:  监听 IPv4 地址 "0.0.0.0", 端口 5432
2025-02-04 18:15:04.303 UTC [548] LOG:  监听 IPv6 地址 "::", 端口 5432
2025-02-04 18:15:04.305 UTC [548] LOG:  监听 Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-02-04 18:15:04.313 UTC [575] LOG:  在 2025-02-04 18:14:37 UTC 关闭数据库系统
2025-02-04 18:15:04.318 UTC [548] LOG:  数据库系统已准备好接受连接

另外 ./launcher logs app 输出以下内容

x86_64 架构已检测。
run-parts: 执行 /etc/runit/1.d/00-ensure-links
run-parts: 执行 /etc/runit/1.d/00-fix-var-logs
run-parts: 执行 /etc/runit/1.d/01-cleanup-web-pids
run-parts: 执行 /etc/runit/1.d/anacron
run-parts: 执行 /etc/runit/1.d/cleanup-pids
清理过时的 PID 文件
run-parts: 执行 /etc/runit/1.d/copy-env
run-parts: 执行 /etc/runit/1.d/letsencrypt
[Tue Feb  4 06:15:03 PM UTC 2025] 域名没有变化。
[Tue Feb  4 06:15:03 PM UTC 2025] 跳过,下次续订时间为:2025-02-09T00:30:10Z
[Tue Feb  4 06:15:03 PM UTC 2025] 添加 '--force' 强制续订。
[Tue Feb  4 06:15:03 PM UTC 2025] 证书安装到:/shared/ssl/forum.myforum.com.key
[Tue Feb  4 06:15:03 PM UTC 2025] 完整链安装到:/shared/ssl/forum.myforum.com.cer
[Tue Feb  4 06:15:03 PM UTC 2025] 运行重载命令:sv reload nginx
警告: nginx:无法打开 supervise/ok:文件不存在
[Tue Feb  4 06:15:03 PM UTC 2025] 重新加载错误:
[Tue Feb  4 06:15:03 PM UTC 2025] 域名没有变化。
[Tue Feb  4 06:15:03 PM UTC 2025] 跳过,下次续订时间为:2025-02-09T00:30:15Z
[Tue Feb  4 06:15:03 PM UTC 2025] 添加 '--force' 强制续订。
[Tue Feb  4 06:15:04 PM UTC 2025] 证书安装到:/shared/ssl/forum.myforum.com_ecc.key
[Tue Feb  4 06:15:04 PM UTC 2025] 完整链安装到:/shared/ssl/forum.myforum.com_ecc.cer
[Tue Feb  4 06:15:04 PM UTC 2025] 运行重载命令:sv reload nginx
警告: nginx:无法打开 supervise/ok:文件不存在
[Tue Feb  4 06:15:04 PM UTC 2025] 重新加载错误:
启动 runsvdir,PID是 537
ok: 运行中:redis: (pid 552) 0秒
ok: 运行中:postgres: (pid 548) 0秒
nginx: [警告] 重复扩展 "wasm",内容类型:"application/wasm",在 /etc/nginx/conf.d/discourse.conf:4 中,之前的内容类型:"application/wasm"
supervisor PID:546 unicorn PID:579
2 个赞

我的两个自托管站点在今天从命令行更新后也出现了这种情况。这些都是非常标准的安装,没有任何自定义或非官方插件,并且会定期更新,通常都能顺利更新。

我目前正在尝试上面 @mwaniki 的建议,看看效果如何,然后在这里汇报。

2 个赞

我重新构建了应用程序,更新已成功完成,但即使没有显示错误,也无法访问该网站。有什么想法吗?

./launcher logs app

警告:Docker 版本 20.10.12 已弃用,建议升级到 24.0.7 或更高版本。
检测到 x86_64 架构。
警告:containers/app.yml 文件是可被所有人读取的。您可以通过运行以下命令来保护此文件:chmod o-rwx containers/app.yml
run-parts: executing /etc/runit/1.d/00-ensure-links
run-parts: executing /etc/runit/1.d/00-fix-var-logs
run-parts: executing /etc/runit/1.d/01-cleanup-web-pids
run-parts: executing /etc/runit/1.d/anacron
run-parts: executing /etc/runit/1.d/cleanup-pids
正在清理陈旧的 PID 文件
run-parts: executing /etc/runit/1.d/copy-env
run-parts: executing /etc/runit/1.d/letsencrypt
[Tue Feb  4 07:12:15 PM UTC 2025] 域名未更改。
[Tue Feb  4 07:12:15 PM UTC 2025] 跳过,下次续订时间为:2025-03-06T00:39:07Z
[Tue Feb  4 07:12:15 PM UTC 2025] 添加 '--force' 以强制续订。
[Tue Feb  4 07:12:16 PM UTC 2025] 安装密钥到:/shared/ssl/forum.******.com.key
[Tue Feb  4 07:12:16 PM UTC 2025] 安装完整链到:/shared/ssl/forum.*****.com.cer
[Tue Feb  4 07:12:16 PM UTC 2025] 运行重载命令:sv reload nginx
警告:nginx:无法打开 supervise/ok:文件不存在
[Tue Feb  4 07:12:16 PM UTC 2025] 重载错误:
[Tue Feb  4 07:12:16 PM UTC 2025] 域名未更改。
[Tue Feb  4 07:12:16 PM UTC 2025] 跳过,下次续订时间为:2025-03-06T00:39:11Z
[Tue Feb  4 07:12:16 PM UTC 2025] 添加 '--force' 以强制续订。
[Tue Feb  4 07:12:16 PM UTC 2025] 安装密钥到:/shared/ssl/forum.*****.com_ecc.key
[Tue Feb  4 07:12:16 PM UTC 2025] 安装完整链到:/shared/ssl/forum.ü_ecc.cer
[Tue Feb  4 07:12:16 PM UTC 2025] 运行重载命令:sv reload nginx
警告:nginx:无法打开 supervise/ok:文件不存在
[Tue Feb  4 07:12:16 PM UTC 2025] 重载错误:
已启动 runsvdir,PID 为 535
ok: run: redis: (pid 545) 0s
nginx: [warn] duplicate extension "wasm", content type: "application/wasm", previous content type: "application/wasm" in /etc/nginx/conf.d/discourse.conf:4
ok: run: postgres: (pid 548) 0s
supervisor pid: 542 unicorn pid: 575
2 个赞

我认为这个“网站完全没有响应”与 PostgreSQL 更新无关。我正在查看 :eyes:

5 个赞

我迫切地等待这个问题得到解决。我收到数百封用户邮件,询问我为什么无法访问论坛 :frowning:

4 个赞

抱歉给大家带来不便!修复补丁现已上线,再次运行 ./launcher rebuild app 应该就能恢复正常。

请在执行后告知我们是否仍遇到任何问题。

13 个赞

在工作 :slight_smile:

4 个赞

论坛又恢复了,太棒了。感谢您快速解决问题。这就是我们热爱 Discourse 的原因 :heart:

4 个赞

太好了,谢谢你帮我解决这个问题!我花了过去两个多小时才弄清楚我的重建出了什么问题。现在一切都好!

一个元观察:在此论坛上研究问题时,论坛搜索结果的默认“相关性”排序对我起了反作用。我搜索的错误日志与此处完全相同,但这个最近的话题在结果列表中只出现了许多页之后(可能是因为它最近才出现)。因此,我直到随意打开了元首页,看到它正在流行,才找到这个话题。我想这是给自己/他人留下的一个提示,在研究未来的重建问题时,也要检查首页或最近的结果。

6 个赞

这真是个很棒的反馈!请注意,最初的对话发生在 https://meta.discourse.org/t/postgresql-15-update/349515,直到 David 意识到这与 PostgreSQL 更新无关,并将相关帖子移至新主题。这大约在一个小时前才发生,所以在此之前你无法找到它!

故障排除失败的更新是出了名的困难——尤其是因为更新 Discourse 通常非常顺利,所以我们大多数自托管者不必学习 Discourse 的内部工作原理和故障排除步骤!

感谢 @david 如此迅速地对此进行研究并找到解决方案!

3 个赞

我通过移动应用程序更新了 Docker,然后收到了令人讨厌的“通过控制台进行更新”消息,这通常预示着未来的挑战。

我已经遵循了所有手动更新步骤,但每次重建应用程序都会失败。

我可以通过启动应用程序来恢复,所以网站可以正常工作。

我不清楚这是否与 PostgreSQL 错误有关,或者我可能在哪里遇到问题。

2 个赞

这表明它不是此主题中讨论的同一个问题。
对于这个问题,rebuild 成功完成而没有报错,但网站加载失败。./launcher start 也没有帮助。

因此,我建议您另开一个 Support 主题,并提供您看到的错误的详细信息。

2 个赞

我的网站在重建后已恢复上线。谢谢!:+1:

4 个赞

归根结底是我的阅读理解问题,我可以肯定地说问题在于 Postgres 数据库未能正确关闭。在我遵循了正确的说明后,一切都恢复正常了。谢谢大家,很高兴有这样一个地方,当事情出错让我有点恐慌时,这里有更冷静的头脑来提供帮助。

谢谢!!!

7 个赞

对我来说无效

2 个赞

那一定是别的问题了。请开一个新的#support话题,提供详细信息,我们会尽力帮忙。

为避免混淆,我将关闭此话题,因为这个具体问题已解决。