PostgreSQL在重建过程中卡住

haroldfy · 2024 年8 月 26 日 17:36

大家好，

我在重建应用程序时遇到了 PostgreSQL 启动问题，希望得到一些帮助。

这是日志，它已经卡在这个状态超过 30 分钟了。

Status: Image is up to date for discourse/base:2.0.20240825-0027
docker.io/discourse/base:2.0.20240825-0027
/usr/local/lib/ruby/gems/3.3.0/gems/pups-1.2.1/lib/pups.rb
/usr/local/bin/pups --stdin
I, [2024-08-26T17:16:15.344712 #1]  INFO -- : Reading from stdin
I, [2024-08-26T17:16:15.357924 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2024-08-26T17:16:15.362740 #1]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x  chown:
I, [2024-08-26T17:16:15.367767 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2024-08-26T17:16:15.372845 #1]  INFO -- : File > /root/install_postgres  chmod: +x  chown:
I, [2024-08-26T17:16:15.377501 #1]  INFO -- : File > /root/upgrade_postgres  chmod: +x  chown:
I, [2024-08-26T17:16:15.377876 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/13/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.378854 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.379386 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.379835 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.380263 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.380761 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.381203 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.381901 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.382352 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.382802 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres  peer in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-26T17:16:15.383231 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-26T17:16:15.383604 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*::1\/128.*$) with host all all ::/0 md5 in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-26T17:16:15.384079 #1]  INFO -- : > if [ -f /root/install_postgres ]; then
  /root/install_postgres & && rm -f /root/install_postgres
elif [ -e /shared/postgres_run/.s.PGSQL.5432 ]; then
  socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
fi

2024/08/26 17:16:15 socat[28] E connect(, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): Connection refused
I, [2024-08-26T17:16:15.452500 #1]  INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2024-08-26T17:16:15.453058 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main
I, [2024-08-26T17:16:15.455944 #1]  INFO -- : Terminating async processes
2024-08-26 17:16:15.500 UTC [30] LOG:  starting PostgreSQL 13.16 (Debian 13.16-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-08-26 17:16:15.501 UTC [30] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-26 17:16:15.501 UTC [30] LOG:  listening on IPv6 address "::", port 5432
2024-08-26 17:16:15.507 UTC [30] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-26 17:16:15.516 UTC [31] LOG:  database system was interrupted; last known up at 2024-08-26 17:10:28 UTC
2024-08-26 17:16:15.769 UTC [31] LOG:  database system was not properly shut down; automatic recovery in progress
2024-08-26 17:16:15.774 UTC [31] LOG:  redo starts at 18F/E62D1458
2024-08-26 17:16:15.774 UTC [31] LOG:  invalid record length at 18F/E62D1490: wanted 24, got 0
2024-08-26 17:16:15.774 UTC [31] LOG:  redo done at 18F/E62D1458
2024-08-26 17:16:15.809 UTC [30] LOG:  database system is ready to accept connections```

pfaffman · 2024 年8 月 26 日 21:28

它没有正常关闭，并尝试修复问题，它认为它已经修复了。

也许可以按 Ctrl+C 退出，然后尝试运行 ./launcher start app 来重新启动旧容器。

如果成功，您可以再次尝试运行 ./launcher stop app，然后重新构建。

Equination · 2024 年8 月 27 日 00:16

在过去几天里，我尝试重建时也遇到了同样的问题。我无法在没有问题的情况下运行或重建 Discourse。

我尝试使用启动/停止功能，但似乎不起作用。虚拟机本身也已重新启动了几次。它一直停留在关于数据库已准备好接受连接的那一行。

haroldfy · 2024 年8 月 27 日 05:51

Control-C 无效，我尝试了许多不同的方法，包括恢复到旧版本，但只要我尝试重新构建，它就会卡在完全相同的位置。

pfaffman · 2024 年8 月 27 日 10:04

你有多少内存？你的网络连接慢吗？

Mycobee · 2024 年8 月 27 日 22:50

关于我的问题……内存充足……8GB。网络也很好。

haroldfy · 2024 年8 月 28 日 02:39

4GB 内存，我检查了网络、磁盘使用情况、CPU 使用情况、内存使用情况，一切看起来都正常。

Mycobee · 2024 年8 月 28 日 03:04

我取得了更多进展。在服务器的 /var/discourse/ 目录下，我检出了提交 b1108913820edd27f869634d0fc654639758889a。此提交来自几天前，并且不包含 discourse_docker 历史中的这三个提交（1、2、3）。我怀疑其中一项更改是导致 postgres 挂起的原因。

总之，应用终于恢复了。那是一次糟糕的经历，哈哈。

karussell · 2024 年8 月 28 日 07:21

升级到 3.3.1 时也遇到了同样的问题。升级卡在相同的日志行（数据库系统已准备好接受连接）。

重启或只是终止升级过程并运行 ./launcher start app 可以解决问题。显示的新版本是 3.3.1。但不确定这是否是一个稳妥的操作。

Ed_S · 2024 年8 月 28 日 15:36

所以，我认为有四个人有问题。
你们遇到问题的人是在 ARM 上还是在 Intel 上？

pfaffman · 2024 年8 月 28 日 16:41

这是个好问题。

我刚在一个新的 Digital Ocean 虚拟机上进行了全新安装，然后运行了重建，一切正常。

haroldfy · 2024 年8 月 28 日 17:40

我使用的是 Intel。

我解决此问题的方法是启动一个新的液滴并进行全新安装，然后恢复备份，之后重建就能正常工作。

我还有一个工作版本的备份（该版本稍旧），但一旦我通过重建升级到最新版本，就遇到了同样的问题，所以我怀疑是最近的提交引入了某些内容，并且只会在旧版本 → 新版本更新时中断。

pfaffman · 2024 年8 月 28 日 17:52

糟糕。

嗯。我看看我有没有一个不在乎它是否会宕机的站点。

我猜你有一个标准的单容器标准安装。我看看能不能找到一个。

tanya_byrne · 2024 年8 月 28 日 20:07

Just bumping this as have also seen this issue since the above commit. Tried all the above too to resolve issue.

Mycobee · 2024 年8 月 28 日 20:58

x86。我的宿主操作系统是 Ubuntu Bionic……也许这很重要。不确定其他人的操作系统是什么。

pfaffman · 2024 年8 月 28 日 21:01

它已经超过了 EOL（生命周期结束）一年。https://ubuntu.com/blog/ubuntu-18-04-eol-for-devices。

现在是时候启动一个新的虚拟机并迁移到那里了。

tanya_byrne · 2024 年8 月 28 日 23:47

为了帮助调查此问题，再提供一些额外信息。

在运行 Ubuntu 18.04.6 的一台主机上看到了这个问题，但今天更新的另一台运行相同版本 Ubuntu 的主机上的 Discourse 重建则正常进行。

我将尝试升级受影响主机上的 Ubuntu，看看是否有帮助。我会随时向大家汇报。

tgxworld · 2024 年8 月 29 日 01:45

对于受影响的用户，请运行命令 ls -lahn /var/discourse/shared/standalone/ | grep -E \"postgres|redis\" 并告知我输出是否与下方不同？

drwxr-xr-x  2  101 104 4.0K Aug 29 01:33 postgres_backup
drwx------ 19  101 104 4.0K Aug 29 01:42 postgres_data
drwxrwxr-x  3  101 104 4.0K Aug 29 01:42 postgres_run
drwxr-xr-x  2  103 106 4.0K Aug 29 01:38 redis_data

Mycobee · 2024 年8 月 29 日 04:03

# ls -lahn /var/discourse/shared/standalone/ | grep -E \"postgres|redis\" 
drwxr-xr-x  2  101 104 4.0K Dec 26  2019 postgres_backup
drwx------ 19  101 104 4.0K Aug 28 03:59 postgres_data
drwxrwxr-x  5  101 104 4.0K Aug 28 03:59 postgres_run
drwxr-xr-x  2  103 106 4.0K Aug 29 03:59 redis_data

tanya_byrne · 2024 年8 月 29 日 17:39

虚拟机重建出现问题的输出：

drwxr-xr-x  2  101 104 4.0K Jun 15  2020 postgres_backup
drwx------ 19  101 104 4.0K May  3  2022 postgres_data
drwxrwsr-x  5  101 104 4.0K May  3  2022 postgres_run
drwxr-xr-x  2  103 106 4.0K May  3  2022 redis_data

只是说明一下，我的情况略有不同。
重建卡在了“数据库系统已准备好接受连接”上，正如其他人所见。我不得不重启虚拟机并运行 ./launcher start app 来启动论坛。但是，当 Discourse 恢复运行时，Discourse 版本仍停留在之前的版本 3.3.0.beta4-dev。

我今天无法执行 Ubuntu 升级，但一旦我能执行并且重建成功，我会及时通知大家。

我今天将我们的开发实例升级到了 Ubuntu 20.6，重建/升级成功到了 Discourse 3.4.0.beta2-dev。但是，这也是昨天在 Ubuntu 18.4 上重建没有问题的宿主机。

话题		回复	浏览量
Forum Full Crash (Test Pressing) Self-hosting	9	182	2024 年9 月 4 日
Stuck on update Self-hosting	3	237	2024 年9 月 4 日
Update Error Self-hosting	7	154	2024 年8 月 30 日
Postgres Errors on Rebuild Self-hosting	3	1624	2015 年9 月 16 日
Upgrade failed due to unclean database shutdown Self-hosting	21	4245	2017 年11 月 9 日

PostgreSQL在重建过程中卡住

相关话题