Bootstrap 启动失败,请帮忙 :(

Trying to update and Discourse is failing to bootstrap, can anyone advise what steps I should take to get my site back online? I ran discourse-doctor, but it wasn’t able to rebuild either. I’ll paste my logs from my (third?) attempt below:

 I, [2020-04-11T13:05:40.743940 #1]  INFO -- : Loading --stdin
I, [2020-04-11T13:05:40.750131 #1]  INFO -- : > locale-gen $LANG && update-locale
I, [2020-04-11T13:05:40.800464 #1]  INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2020-04-11T13:05:40.800750 #1]  INFO -- : > mkdir -p /shared/postgres_run
I, [2020-04-11T13:05:40.804473 #1]  INFO -- : 
I, [2020-04-11T13:05:40.804720 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2020-04-11T13:05:40.809297 #1]  INFO -- : 
I, [2020-04-11T13:05:40.809573 #1]  INFO -- : > chmod 775 /shared/postgres_run
I, [2020-04-11T13:05:40.813859 #1]  INFO -- : 
I, [2020-04-11T13:05:40.814113 #1]  INFO -- : > rm -fr /var/run/postgresql
I, [2020-04-11T13:05:40.818749 #1]  INFO -- : 
I, [2020-04-11T13:05:40.819005 #1]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2020-04-11T13:05:40.823141 #1]  INFO -- : 
I, [2020-04-11T13:05:40.823401 #1]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2020/04/11 13:05:40 socat[26] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): Connection refused
I, [2020-04-11T13:05:40.831215 #1]  INFO -- : 
I, [2020-04-11T13:05:40.831436 #1]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2020-04-11T13:05:40.837350 #1]  INFO -- : 
I, [2020-04-11T13:05:40.837549 #1]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2020-04-11T13:05:40.843241 #1]  INFO -- : 
I, [2020-04-11T13:05:40.843486 #1]  INFO -- : > mkdir -p /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-04-11T13:05:40.848311 #1]  INFO -- : 
I, [2020-04-11T13:05:40.848643 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-04-11T13:05:40.853421 #1]  INFO -- : 
I, [2020-04-11T13:05:40.863897 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown: 
I, [2020-04-11T13:05:40.873965 #1]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x  chown: 
I, [2020-04-11T13:05:40.884485 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown: 
I, [2020-04-11T13:05:40.895036 #1]  INFO -- : File > /root/upgrade_postgres  chmod: +x  chown: 
I, [2020-04-11T13:05:40.895478 #1]  INFO -- : > chown -R root /var/lib/postgresql/10/main
I, [2020-04-11T13:06:09.192272 #1]  INFO -- : 
I, [2020-04-11T13:06:09.192593 #1]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o postgres -g postgres /shared/postgres_data && sudo -E -u postgres /usr/lib/postgresql/10/bin/initdb -D /shared/postgres_data || exit 0
I, [2020-04-11T13:06:09.197837 #1]  INFO -- : 
I, [2020-04-11T13:06:09.197992 #1]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2020-04-11T13:06:09.228153 #1]  INFO -- : 
I, [2020-04-11T13:06:09.228375 #1]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2020-04-11T13:06:09.233425 #1]  INFO -- : 
I, [2020-04-11T13:06:09.233760 #1]  INFO -- : > /root/upgrade_postgres
I, [2020-04-11T13:06:09.243357 #1]  INFO -- : 
I, [2020-04-11T13:06:09.243596 #1]  INFO -- : > rm /root/upgrade_postgres
I, [2020-04-11T13:06:09.249007 #1]  INFO -- : 
I, [2020-04-11T13:06:09.249495 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/10/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.272106 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.272853 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.273496 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.274139 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.274757 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.275288 #1]  INFO -- : > install -d -m 0755 -o postgres -g postgres /shared/postgres_backup
I, [2020-04-11T13:06:09.282029 #1]  INFO -- : 
I, [2020-04-11T13:06:09.282556 #1]  INFO -- : Replacing (?-mix:#?max_wal_senders *=.*) with max_wal_senders = $db_max_wal_senders in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.330309 #1]  INFO -- : Replacing (?-mix:#?wal_level *=.*) with wal_level = $db_wal_level in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.331016 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.331655 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.332347 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.333020 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres  peer in /etc/postgresql/10/main/pg_hba.conf
I, [2020-04-11T13:06:09.333660 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/10/main/pg_hba.conf
I, [2020-04-11T13:06:09.334205 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main
I, [2020-04-11T13:06:09.338659 #1]  INFO -- : > sleep 5
2020-04-11 13:06:09.481 UTC [49] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-04-11 13:06:09.482 UTC [49] LOG:  listening on IPv6 address "::", port 5432
2020-04-11 13:06:09.544 UTC [49] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-04-11 13:06:10.813 UTC [52] LOG:  database system shutdown was interrupted; last known up at 2020-04-11 12:42:48 UTC
2020-04-11 13:06:12.356 UTC [52] LOG:  database system was not properly shut down; automatic recovery in progress
2020-04-11 13:06:12.416 UTC [52] LOG:  redo starts at 84/BD1A85B0
2020-04-11 13:06:12.446 UTC [52] LOG:  invalid record length at 84/BD7B3570: wanted 24, got 0
2020-04-11 13:06:12.446 UTC [52] LOG:  redo done at 84/BD7B3530
2020-04-11 13:06:12.446 UTC [52] LOG:  last completed transaction was at log time 2020-04-11 11:53:53.069757+00
I, [2020-04-11T13:06:14.343128 #1]  INFO -- : 
I, [2020-04-11T13:06:14.343449 #1]  INFO -- : > su postgres -c 'createdb discourse' || true
2020-04-11 13:06:14.409 UTC [56] postgres@postgres FATAL:  the database system is starting up
2020-04-11 13:06:14.411 UTC [57] postgres@template1 FATAL:  the database system is starting up
createdb: could not connect to database template1: FATAL:  the database system is starting up
I, [2020-04-11T13:06:14.413548 #1]  INFO -- : 
I, [2020-04-11T13:06:14.413822 #1]  INFO -- : > su postgres -c 'psql discourse -c "create user discourse;"' || true
2020-04-11 13:06:14.493 UTC [68] postgres@discourse FATAL:  the database system is starting up
psql: FATAL:  the database system is starting up
I, [2020-04-11T13:06:14.495917 #1]  INFO -- : 
I, [2020-04-11T13:06:14.496250 #1]  INFO -- : > su postgres -c 'psql discourse -c "grant all privileges on database discourse to discourse;"' || true
2020-04-11 13:06:14.574 UTC [79] postgres@discourse FATAL:  the database system is starting up
psql: FATAL:  the database system is starting up
I, [2020-04-11T13:06:14.576284 #1]  INFO -- : 
I, [2020-04-11T13:06:14.576674 #1]  INFO -- : > su postgres -c 'psql discourse -c "alter schema public owner to discourse;"'
2020-04-11 13:06:14.655 UTC [90] postgres@discourse FATAL:  the database system is starting up
psql: FATAL:  the database system is starting up
I, [2020-04-11T13:06:14.658074 #1]  INFO -- : 
I, [2020-04-11T13:06:14.658635 #1]  INFO -- : Terminating async processes
I, [2020-04-11T13:06:14.658722 #1]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main pid: 49
2020-04-11 13:06:14.658 UTC [49] LOG:  received fast shutdown request
I, [2020-04-11T13:06:24.659800 #1]  INFO -- : HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main pid:49 did not terminate cleanly, forcing termination!


FAILED
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' failed with return #<Process::Status: pid 80 exit 2>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "su postgres -c 'psql $db_name -c \"alter schema public owner to $db_user;\"'"
3956a617d80571dd1ef87fcf7d23a319190bd909477e703ccb22cb25d0e137c8
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
1 个赞

看起来您的数据库已损坏。您的磁盘空间是否充足?(我怀疑问题不在于此,但如果是这样就好了。)

最简单的解决方案可能是删除数据库目录,这将重建一个新的数据库。然后,您可以从最近的备份中进行恢复。

5 个赞

现在热度稍退,我来同步一下最新进展:重启服务器后,应用得以重建。这个建议是 @merefield 提出的,我在论坛其他地方偶然看到,便抱着“死马当活马医”的心态尝试了。

至于 Postgres 到底在闹什么,至今仍是谜。不过现在我有了一些新问题,虽然没那么棘手。往好处想,我不用销毁我的大型数据库了。

@pfaffman,我们拥有大量存储空间,离上限还远着呢。

感谢大家的反馈,希望我这番痛苦的经历能帮到其他人 :slight_smile:

4 个赞

您好,

我们在版本升级重建时收到了相同的消息……

……并认为我们也遇到了同样的问题——这是我们第一次遇到——

……犹豫地寻找备份目录;]。有一个新的备份在那里,所以我们稍微放松了一下。

我们还想分享一下,虽然我们没有重启服务器,但我们稍等了一会儿,第二次重建仍然失败,出现相同的错误消息 invalid record length at,但现在尝试第三次时,更新过程似乎正在进行。服务器有足够的磁盘空间,但目前 I/O 压力有点大。

也许这能帮助到其他遇到类似情况的人。根据这次经验,我们建议 a) 在尝试再次重建之前,缓解主机系统的资源过度消耗,以及 b) 不要过早放弃。

此致,
Andreas。

2 个赞

嗯,它说正在进行恢复,所以也许你可以让它恢复?我认为你也许可以重启旧容器让它这样做。

这个问题解决了吗?

1 个赞

一切顺利,升级在第三次尝试时成功完成,没有报错。谢谢你,新年快乐!

1 个赞

之前遇到的类似问题通过调整超时值得到了解决:

或者通过等待:

2 个赞