尝试恢复安装

Norike · 2023 年4 月 17 日 21:12

我在 CentOS 机器上按照指南此处安装了 Discourse。
如果这有帮助，我们还使用了 NGINX Proxy Manager。

安装工作了几周，直到我们不得不重启机器。然后它就无法启动了。这是 rebuild 的输出：

Ensuring launcher is up to date
Launcher is up-to-date
2.0.20230409-0052: Pulling from discourse/base
Digest: sha256:dd75ceb9322f79629f8b0bf78cfb0f79ad6bb366b7ead3e1cd32dcb8712ec46f
Status: Image is up to date for discourse/base:2.0.20230409-0052
docker.io/discourse/base:2.0.20230409-0052
/usr/local/lib/ruby/gems/3.2.0/gems/pups-1.1.1/lib/pups.rb
/usr/local/bin/pups --stdin
I, [2023-04-13T09:55:28.002076 #1]  INFO -- : Reading from stdin
I, [2023-04-13T09:55:28.005509 #1]  INFO -- : > locale-gen $LANG && update-locale
I, [2023-04-13T09:55:28.031953 #1]  INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2023-04-13T09:55:28.032080 #1]  INFO -- : > mkdir -p /shared/postgres_run
I, [2023-04-13T09:55:28.034322 #1]  INFO -- :
I, [2023-04-13T09:55:28.034470 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2023-04-13T09:55:28.036190 #1]  INFO -- :
I, [2023-04-13T09:55:28.036297 #1]  INFO -- : > chmod 775 /shared/postgres_run
I, [2023-04-13T09:55:28.037819 #1]  INFO -- :
I, [2023-04-13T09:55:28.037918 #1]  INFO -- : > rm -fr /var/run/postgresql
I, [2023-04-13T09:55:28.039663 #1]  INFO -- :
I, [2023-04-13T09:55:28.039770 #1]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2023-04-13T09:55:28.041327 #1]  INFO -- :
I, [2023-04-13T09:55:28.041457 #1]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGS
QL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2023/04/13 09:55:28 socat[19] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): No such file
or directory
I, [2023-04-13T09:55:28.045222 #1]  INFO -- :
I, [2023-04-13T09:55:28.045329 #1]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2023-04-13T09:55:28.047498 #1]  INFO -- :
I, [2023-04-13T09:55:28.047590 #1]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2023-04-13T09:55:28.049691 #1]  INFO -- :
I, [2023-04-13T09:55:28.049815 #1]  INFO -- : > mkdir -p /shared/postgres_run/13-main.pg_stat_tmp
I, [2023-04-13T09:55:28.051658 #1]  INFO -- :
I, [2023-04-13T09:55:28.051800 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run/13-main.pg_
stat_tmp
I, [2023-04-13T09:55:28.053477 #1]  INFO -- :
I, [2023-04-13T09:55:28.057301 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2023-04-13T09:55:28.060969 #1]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x  chown:
I, [2023-04-13T09:55:28.064663 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2023-04-13T09:55:28.068319 #1]  INFO -- : File > /root/upgrade_postgres  chmod: +x  chown:
I, [2023-04-13T09:55:28.068510 #1]  INFO -- : > chown -R root /var/lib/postgresql/13/main
I, [2023-04-13T09:55:37.064277 #1]  INFO -- :
I, [2023-04-13T09:55:37.064575 #1]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o
I, [2023-04-13T09:55:37.066210 #1]  INFO -- :
I, [2023-04-13T09:55:37.066255 #1]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2023-04-13T09:55:37.076706 #1]  INFO -- :
I, [2023-04-13T09:55:37.076786 #1]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2023-04-13T09:55:37.078711 #1]  INFO -- :
I, [2023-04-13T09:55:37.078822 #1]  INFO -- : > /root/upgrade_postgres
I, [2023-04-13T09:55:37.082200 #1]  INFO -- :
I, [2023-04-13T09:55:37.082297 #1]  INFO -- : > rm /root/upgrade_postgres
I, [2023-04-13T09:55:37.083919 #1]  INFO -- :
I, [2023-04-13T09:55:37.084109 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/13/main' w
I, [2023-04-13T09:55:37.084484 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addr
I, [2023-04-13T09:55:37.085049 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchrono
I, [2023-04-13T09:55:37.085423 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffer
I, [2023-04-13T09:55:37.085857 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work
I, [2023-04-13T09:55:37.086303 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with d
I, [2023-04-13T09:55:37.086614 #1]  INFO -- : > install -d -m 0755 -o postgres -g postgres /shared/postg
I, [2023-04-13T09:55:37.089011 #1]  INFO -- :
I, [2023-04-13T09:55:37.089245 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoi
I, [2023-04-13T09:55:37.089516 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_co
I, [2023-04-13T09:55:37.089887 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with l
I, [2023-04-13T09:55:37.090322 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) wi
I, [2023-04-13T09:55:37.090509 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all al
I, [2023-04-13T09:55:37.090946 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*::1\/128.*$) with host a
I, [2023-04-13T09:55:37.091279 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u pos
I, [2023-04-13T09:55:37.092562 #1]  INFO -- : > sleep 5
2023-04-13 09:55:37.239 UTC [42] LOG:  starting PostgreSQL 13.10 (Debian 13.10-1.pgdg110+1) on x86_64-pc
2023-04-13 09:55:37.240 UTC [42] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-04-13 09:55:37.240 UTC [42] LOG:  listening on IPv6 address "::", port 5432
2023-04-13 09:55:37.287 UTC [42] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-13 09:55:37.338 UTC [45] LOG:  database system was interrupted; last known up at 2023-04-13 08:3
2023-04-13 09:55:37.795 UTC [45] LOG:  invalid resource manager ID in primary checkpoint record
2023-04-13 09:55:37.795 UTC [45] PANIC:  could not locate a valid checkpoint record
2023-04-13 09:55:38.256 UTC [42] LOG:  startup process (PID 45) was terminated by signal 6: Aborted
2023-04-13 09:55:38.256 UTC [42] LOG:  aborting startup due to startup process failure
2023-04-13 09:55:38.283 UTC [42] LOG:  database system is shut down
I, [2023-04-13T09:55:42.094474 #1]  INFO -- :
I, [2023-04-13T09:55:42.094602 #1]  INFO -- : > su postgres -c 'createdb discourse' || true
createdb: error: could not connect to database template1: connection to server on socket "/var/run/postg
        Is the server running locally and accepting connections on that socket?
I, [2023-04-13T09:55:42.130700 #1]  INFO -- :
I, [2023-04-13T09:55:42.130828 #1]  INFO -- : > su postgres -c 'psql discourse -c "create user discourse
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?
I, [2023-04-13T09:55:42.166116 #1]  INFO -- :
I, [2023-04-13T09:55:42.166242 #1]  INFO -- : > su postgres -c 'psql discourse -c "grant all privileges on database discourse to discourse;"' || true
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?
I, [2023-04-13T09:55:42.201461 #1]  INFO -- :
I, [2023-04-13T09:55:42.201617 #1]  INFO -- : > su postgres -c 'psql discourse -c "alter schema public owner to discourse;"'
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?
I, [2023-04-13T09:55:42.236768 #1]  INFO -- :
I, [2023-04-13T09:55:42.236996 #1]  INFO -- : Terminating async processes

FAILED
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' failed with return #<Process::Status: pid 55 exit 2>
Location of failure: /usr/local/lib/ruby/gems/3.2.0/gems/pups-1.1.1/lib/pups/exec_command.rb:117:in `spawn'
exec failed with the params "su postgres -c 'psql $db_name -c \\\"alter schema public owner to $db_user;\\\"'"
bootstrap failed with exit code 2
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
cd04fbb38f1ef61e418d680b969c2c056439e3b1dbfe64608524a8d3361cd91c

discourse-doctor 无法找到正在运行的 docker 应用程序：

==================== SERIOUS PROBLEM!!!! ====================
app not running!
Attempting to rebuild

并尝试重建，因此基本上会以与上面相同的日志结束。

提前感谢！

shyguy · 2023 年4 月 18 日 02:06

PANIC: could not locate a valid checkpoint record

我认为您的数据库已损坏。依我看，最好从备份中恢复。我认为 Discourse 默认会进行备份吗？请查看 discourse/shared/app/backups 或其他位置。

如果您没有 Discourse 备份，那么这更多是一个 PostgreSQL 支持主题，而不是 Discourse 主题。我猜我可以帮忙，但我不是专家。

您可能需要使用这个：

Ed_S · 2023 年4 月 18 日 09:09

听起来确实如此。我希望你能找到解决方案，@Norike……与此同时，我觉得这相当令人担忧——我相信软件重启应该能使数据库保持干净状态。这是硬重启，也就是断电重启吗？

我想知道数据库存储在什么文件系统上是否会有所不同。df -T 告诉你什么？

Norike · 2023 年4 月 18 日 09:24

谢谢！

确实，我在 /var/discourse/shared/standalone/backups/default 文件夹中找到了一些备份，这听起来很有希望。
但是，根据从命令行恢复备份指南，容器需要运行才能导入数据库，但 ./launcher enter app 命令失败了：

检测到 x86_64 架构。
警告：containers/app.yml 文件是可被所有人读取的。您可以通过运行以下命令来保护此文件：chmod o-rwx containers/app.yml
守护进程的错误响应：未找到容器：app

我猜我应该从干净的安装开始。是否有重新开始的指南？以下是否正确？

将备份从 /var/discourse/shared/standalone/backups/default 文件夹复制到 /var/discourse 文件夹之外的其他位置
删除 /var/discourse 文件夹
按照安装云指南从头开始安装 Discourse，以获得新的运行实例。
使用 GUI 或上述命令行指南恢复备份

Norike · 2023 年4 月 18 日 09:39

这是在应用了一些更新后进行的软件重启。

这个：

文件系统          类型     1K-块     已用 可用 已用% 已挂载在
devtmpfs            devtmpfs      4096        0      4096   0% /dev
tmpfs               tmpfs      3869940        0   3869940   0% /dev/shm
tmpfs               tmpfs      1547980    17600   1530380   2% /run
/dev/mapper/cs-root xfs       73364480 16750976  56613504  23% /
/dev/sda2           xfs        1038336   395812    642524  39% /boot
/dev/mapper/cs-home xfs      893122476 10838052 882284424   2% /home
/dev/sda1           vfat        613160     7644    605516   2% /boot/efi
overlay             overlay   73364480 16750976  56613504  23% /var/lib/docker/overlay2/a9c2622d4167f08ff4697a4c49febf55a7e460e087c41c05e6c1bbd321b13f62/merged
overlay             overlay   73364480 16750976  56613504  23% /var/lib/docker/overlay2/765273939fa75096c994588ba9c9bac6b5a9b909a60a962169b83f7c8a213b7f/merged
tmpfs               tmpfs       773988        4    773984   1% /run/user/1000

Ed_S · 2023 年4 月 18 日 09:53

谢谢 - 我不熟悉 XFS，但快速搜索表明它可以使用。也许 xfs_info / 会提供有用的信息。

shyguy · 2023 年4 月 18 日 15:54

好的，然后将整个目录复制到某个地方进行妥善保管：

/var/discourse/shared

这样，您就可以尝试从备份恢复，甚至在需要时尝试修复损坏的数据库。

然后删除您的 Docker 容器，运行 docker image prune -a，删除 /var/discourse，然后重新安装 Discourse。之后将最新的备份文件复制到位并尝试恢复。

Norike · 2023 年4 月 18 日 21:23

这恢复了它。谢谢！

system · 2023 年5 月 18 日 21:23

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

话题		回复	浏览量
Bootstrap failed, please help :( Self-hosting support	6	969	2023 年1 月 3 日
Postgres Errors on Rebuild Self-hosting support	3	1624	2015 年9 月 16 日
Building bootstrap error Self-hosting support	46	2054	2022 年12 月 20 日
FAILED TO BOOTSTRAP, even on a fresh installation Self-hosting support	4	742	2023 年3 月 15 日
Discourse update doesn't wait for Postgress DB to shut down Self-hosting support	5	985	2022 年1 月 28 日

尝试恢复安装

相关话题