Bootstrap failed, please help :(

Trying to update and Discourse is failing to bootstrap, can anyone advise what steps I should take to get my site back online? I ran discourse-doctor, but it wasn’t able to rebuild either. I’ll paste my logs from my (third?) attempt below:

 I, [2020-04-11T13:05:40.743940 #1]  INFO -- : Loading --stdin
I, [2020-04-11T13:05:40.750131 #1]  INFO -- : > locale-gen $LANG && update-locale
I, [2020-04-11T13:05:40.800464 #1]  INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2020-04-11T13:05:40.800750 #1]  INFO -- : > mkdir -p /shared/postgres_run
I, [2020-04-11T13:05:40.804473 #1]  INFO -- : 
I, [2020-04-11T13:05:40.804720 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2020-04-11T13:05:40.809297 #1]  INFO -- : 
I, [2020-04-11T13:05:40.809573 #1]  INFO -- : > chmod 775 /shared/postgres_run
I, [2020-04-11T13:05:40.813859 #1]  INFO -- : 
I, [2020-04-11T13:05:40.814113 #1]  INFO -- : > rm -fr /var/run/postgresql
I, [2020-04-11T13:05:40.818749 #1]  INFO -- : 
I, [2020-04-11T13:05:40.819005 #1]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2020-04-11T13:05:40.823141 #1]  INFO -- : 
I, [2020-04-11T13:05:40.823401 #1]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2020/04/11 13:05:40 socat[26] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): Connection refused
I, [2020-04-11T13:05:40.831215 #1]  INFO -- : 
I, [2020-04-11T13:05:40.831436 #1]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2020-04-11T13:05:40.837350 #1]  INFO -- : 
I, [2020-04-11T13:05:40.837549 #1]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2020-04-11T13:05:40.843241 #1]  INFO -- : 
I, [2020-04-11T13:05:40.843486 #1]  INFO -- : > mkdir -p /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-04-11T13:05:40.848311 #1]  INFO -- : 
I, [2020-04-11T13:05:40.848643 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-04-11T13:05:40.853421 #1]  INFO -- : 
I, [2020-04-11T13:05:40.863897 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown: 
I, [2020-04-11T13:05:40.873965 #1]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x  chown: 
I, [2020-04-11T13:05:40.884485 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown: 
I, [2020-04-11T13:05:40.895036 #1]  INFO -- : File > /root/upgrade_postgres  chmod: +x  chown: 
I, [2020-04-11T13:05:40.895478 #1]  INFO -- : > chown -R root /var/lib/postgresql/10/main
I, [2020-04-11T13:06:09.192272 #1]  INFO -- : 
I, [2020-04-11T13:06:09.192593 #1]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o postgres -g postgres /shared/postgres_data && sudo -E -u postgres /usr/lib/postgresql/10/bin/initdb -D /shared/postgres_data || exit 0
I, [2020-04-11T13:06:09.197837 #1]  INFO -- : 
I, [2020-04-11T13:06:09.197992 #1]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2020-04-11T13:06:09.228153 #1]  INFO -- : 
I, [2020-04-11T13:06:09.228375 #1]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2020-04-11T13:06:09.233425 #1]  INFO -- : 
I, [2020-04-11T13:06:09.233760 #1]  INFO -- : > /root/upgrade_postgres
I, [2020-04-11T13:06:09.243357 #1]  INFO -- : 
I, [2020-04-11T13:06:09.243596 #1]  INFO -- : > rm /root/upgrade_postgres
I, [2020-04-11T13:06:09.249007 #1]  INFO -- : 
I, [2020-04-11T13:06:09.249495 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/10/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.272106 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.272853 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.273496 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.274139 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.274757 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.275288 #1]  INFO -- : > install -d -m 0755 -o postgres -g postgres /shared/postgres_backup
I, [2020-04-11T13:06:09.282029 #1]  INFO -- : 
I, [2020-04-11T13:06:09.282556 #1]  INFO -- : Replacing (?-mix:#?max_wal_senders *=.*) with max_wal_senders = $db_max_wal_senders in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.330309 #1]  INFO -- : Replacing (?-mix:#?wal_level *=.*) with wal_level = $db_wal_level in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.331016 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.331655 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.332347 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/10/main/postgresql.conf
I, [2020-04-11T13:06:09.333020 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres  peer in /etc/postgresql/10/main/pg_hba.conf
I, [2020-04-11T13:06:09.333660 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/10/main/pg_hba.conf
I, [2020-04-11T13:06:09.334205 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main
I, [2020-04-11T13:06:09.338659 #1]  INFO -- : > sleep 5
2020-04-11 13:06:09.481 UTC [49] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-04-11 13:06:09.482 UTC [49] LOG:  listening on IPv6 address "::", port 5432
2020-04-11 13:06:09.544 UTC [49] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-04-11 13:06:10.813 UTC [52] LOG:  database system shutdown was interrupted; last known up at 2020-04-11 12:42:48 UTC
2020-04-11 13:06:12.356 UTC [52] LOG:  database system was not properly shut down; automatic recovery in progress
2020-04-11 13:06:12.416 UTC [52] LOG:  redo starts at 84/BD1A85B0
2020-04-11 13:06:12.446 UTC [52] LOG:  invalid record length at 84/BD7B3570: wanted 24, got 0
2020-04-11 13:06:12.446 UTC [52] LOG:  redo done at 84/BD7B3530
2020-04-11 13:06:12.446 UTC [52] LOG:  last completed transaction was at log time 2020-04-11 11:53:53.069757+00
I, [2020-04-11T13:06:14.343128 #1]  INFO -- : 
I, [2020-04-11T13:06:14.343449 #1]  INFO -- : > su postgres -c 'createdb discourse' || true
2020-04-11 13:06:14.409 UTC [56] postgres@postgres FATAL:  the database system is starting up
2020-04-11 13:06:14.411 UTC [57] postgres@template1 FATAL:  the database system is starting up
createdb: could not connect to database template1: FATAL:  the database system is starting up
I, [2020-04-11T13:06:14.413548 #1]  INFO -- : 
I, [2020-04-11T13:06:14.413822 #1]  INFO -- : > su postgres -c 'psql discourse -c "create user discourse;"' || true
2020-04-11 13:06:14.493 UTC [68] postgres@discourse FATAL:  the database system is starting up
psql: FATAL:  the database system is starting up
I, [2020-04-11T13:06:14.495917 #1]  INFO -- : 
I, [2020-04-11T13:06:14.496250 #1]  INFO -- : > su postgres -c 'psql discourse -c "grant all privileges on database discourse to discourse;"' || true
2020-04-11 13:06:14.574 UTC [79] postgres@discourse FATAL:  the database system is starting up
psql: FATAL:  the database system is starting up
I, [2020-04-11T13:06:14.576284 #1]  INFO -- : 
I, [2020-04-11T13:06:14.576674 #1]  INFO -- : > su postgres -c 'psql discourse -c "alter schema public owner to discourse;"'
2020-04-11 13:06:14.655 UTC [90] postgres@discourse FATAL:  the database system is starting up
psql: FATAL:  the database system is starting up
I, [2020-04-11T13:06:14.658074 #1]  INFO -- : 
I, [2020-04-11T13:06:14.658635 #1]  INFO -- : Terminating async processes
I, [2020-04-11T13:06:14.658722 #1]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main pid: 49
2020-04-11 13:06:14.658 UTC [49] LOG:  received fast shutdown request
I, [2020-04-11T13:06:24.659800 #1]  INFO -- : HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main pid:49 did not terminate cleanly, forcing termination!


FAILED
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' failed with return #<Process::Status: pid 80 exit 2>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "su postgres -c 'psql $db_name -c \"alter schema public owner to $db_user;\"'"
3956a617d80571dd1ef87fcf7d23a319190bd909477e703ccb22cb25d0e137c8
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
1 个赞

Looks like your database is corrupt. Do you have enough disk space? (I doubt that’s the problem, but it would be great if it were.)

The easiest solution is probably to delete the database directory, which will build a new database. Then you can restore from your most recent backup.

5 个赞

To give you an update now that I’m out of the heat of it - rebooting the server allowed the app to rebuild. This was a suggestion from @merefield that I stumbled across elsewhere on here, and applied as a Hail Mary solution.

So, whatever Postgres was fussing about remains a mystery to me. Now I have new, but less critical problems. On the bright side, I didn’t have to nuke my large database.

@pfaffman, we have a substantial amount of storage, we’re not even close to hitting the ceiling there.

Thanks for all the feedback, hopefully my painful experience helps someone else out there :slight_smile:

4 个赞

您好,

我们在版本升级重建时收到了相同的消息……

……并认为我们也遇到了同样的问题——这是我们第一次遇到——

……犹豫地寻找备份目录;]。有一个新的备份在那里,所以我们稍微放松了一下。

我们还想分享一下,虽然我们没有重启服务器,但我们稍等了一会儿,第二次重建仍然失败,出现相同的错误消息 invalid record length at,但现在尝试第三次时,更新过程似乎正在进行。服务器有足够的磁盘空间,但目前 I/O 压力有点大。

也许这能帮助到其他遇到类似情况的人。根据这次经验,我们建议 a) 在尝试再次重建之前,缓解主机系统的资源过度消耗,以及 b) 不要过早放弃。

此致,
Andreas。

2 个赞

嗯,它说正在进行恢复,所以也许你可以让它恢复?我认为你也许可以重启旧容器让它这样做。

这个问题解决了吗?

1 个赞

一切顺利,升级在第三次尝试时成功完成,没有报错。谢谢你,新年快乐!

1 个赞

之前遇到的类似问题通过调整超时值得到了解决:

或者通过等待:

2 个赞