Postgres 升级期间出现无限循环

Hi! Each time I launch /launcher rebuild app I get the following message (even if I launch it again) :

UPGRADE OF POSTGRES COMPLETE
Old 9.5 database is stored at /shared/postgres_data_old
To complete the upgrade, rebuild again using:
./launcher rebuild app

What information do you need to help me?

PS: The first time I ran it, the command stoped because I needed more space

Temporarily, I used the templates/postgres.9.5.template.yml template to avoid downtime. The app is running again, but I’d like to finish the update.

When I do a cleanup I get the following message : Old PostgreSQL backup data cluster detected taking up 1.2G detected. Would you like to remove it? (Y/n):

Sounds like a question for @tgxworld, but I am guessing that it failed due to lack of space but somehow didn’t catch that the upgrade had failed.

How much space do you have? And how big is your database?

Can you provide me with the full log generated during the rebuild? Thank you!

我遇到同一个问题已经好几个月了,无法升级我的安装。

它不断循环,从未成功。在 rebuild app 期间出现的一个问题是:
mv: cannot move '/shared/postgres_data' to '/shared/postgres_data_old': Device or resource busy

进入 Docker 后,我当前的 PostgreSQL 版本是:

/usr/lib/postgresql/10/bin/postgres --version
postgres (PostgreSQL) 10.14 (Debian 10.14-1.pgdg100+1)

我收到的错误如下:

FAILED

--------------------

Pups::ExecError: /root/upgrade_postgres 执行失败,返回状态为 #<Process::Status: pid 47 exit 1>

失败位置:/pups/lib/pups/exec_command.rb:112:in `spawn'

执行失败,参数为 "/root/upgrade_postgres"

0c74c9de4d4315b63c0ef9055631f38c0cf4b3dd0be6500fd83ca0a5b13e0d9d

** 启动失败 ** 请向上滚动查看更早的错误信息,可能不止一条。

运行 ./discourse-doctor 可能有助于诊断问题。

看起来问题出在 /pups/lib/pups/exec_command.rb 脚本执行 spawn 时。

不过我可以启动应用:
./launcher start app

以下是我 /shared 目录下的文件:

total 48
drwxr-xr-x 12 root      root     4096 Jan 12 13:12 .
drwxr-xr-x 57 root      root     4096 Dec 15 09:09 ..
drwxr-xr-x  3 discourse www-data 4096 Aug 20  2019 backups
drwxr-xr-x  4 root      root     4096 Aug 19  2019 log
drwxr-xr-x  2 postgres  postgres 4096 Aug 19  2019 postgres_backup
drwx------ 20 postgres  postgres 4096 Jan 12 13:14 postgres_data
drwx------ 19 postgres  postgres 4096 Jan 12 13:12 postgres_data_new
drwxrwxr-x  5 postgres  postgres 4096 Jan 12 13:14 postgres_run
drwxr-xr-x  2 redis     redis    4096 Jan 12 13:07 redis_data
drwxr-xr-x  4 root      root     4096 Aug 19  2019 state
drwxr-xr-x  4 discourse www-data 4096 Jan 12 13:14 tmp
drwxr-xr-x  4 discourse www-data 4096 Sep  8  2019 uploads

我该怎么办?

您的共享目录是否是某种网络挂载点?

不,但我终于做到了!

templates/postgres.template.yml 中的两个 mv 命令并未检查是否成功。脚本总是显示 UPGRADE OF POSTGRES COMPLETE,但在移动出错时这并不属实。

我移除了这些命令,然后进入 Docker 后手动移动了目录,接下来的 rebuild app 现在像以前一样正常工作了!我太开心了 :smiley:
谢谢。

在从 PotsgreSQL 13 升级到 15 的过程中,我遇到了这个问题。

在我们的部署中,/shared/postgres_data 目录挂载在一个更快的 NVMe 存储设备上,因此移动它时出现“设备或资源繁忙”的错误消息。

我们通过如下方式修补 postgres 模板来解决此问题:

diff --git a/templates/postgres.template.yml b/templates/postgres.template.yml
index c24bfe6..03813c4 100644
--- a/templates/postgres.template.yml
+++ b/templates/postgres.template.yml
@@ -139,8 +139,10 @@ run:
            exit 1
          fi
 
-         mv /shared/postgres_data /shared/postgres_data_old
-         mv /shared/postgres_data_new /shared/postgres_data
+         mkdir /shared/postgres_data_old
+         mv /shared/postgres_data/* /shared/postgres_data_old
+         mv /shared/postgres_data_new/* /shared/postgres_data
+         rmdir /shared/postgres_data_new

与其尝试操作目录本身,不如移动其中的内容。

请考虑集成此更改以提高数据库升级的可靠性。

我认为这可能与此有关:FIX: improve postgres upgrade reliability by jcharaoui · Pull Request #989 · discourse/discourse_docker · GitHub

PR 已合并 :+1: