Postgres 升级期间出现无限循环

Hi! Each time I launch /launcher rebuild app I get the following message (even if I launch it again) :

UPGRADE OF POSTGRES COMPLETE
Old 9.5 database is stored at /shared/postgres_data_old
To complete the upgrade, rebuild again using:
./launcher rebuild app

What information do you need to help me?

PS: The first time I ran it, the command stoped because I needed more space

Temporarily, I used the templates/postgres.9.5.template.yml template to avoid downtime. The app is running again, but I’d like to finish the update.

When I do a cleanup I get the following message : Old PostgreSQL backup data cluster detected taking up 1.2G detected. Would you like to remove it? (Y/n):

Sounds like a question for @tgxworld, but I am guessing that it failed due to lack of space but somehow didn’t catch that the upgrade had failed.

How much space do you have? And how big is your database?

1 个赞

Can you provide me with the full log generated during the rebuild? Thank you!

3 个赞

我遇到同一个问题已经好几个月了,无法升级我的安装。

它不断循环,从未成功。在 rebuild app 期间出现的一个问题是:
mv: cannot move '/shared/postgres_data' to '/shared/postgres_data_old': Device or resource busy

进入 Docker 后,我当前的 PostgreSQL 版本是:

/usr/lib/postgresql/10/bin/postgres --version
postgres (PostgreSQL) 10.14 (Debian 10.14-1.pgdg100+1)

我收到的错误如下:

FAILED

--------------------

Pups::ExecError: /root/upgrade_postgres 执行失败,返回状态为 #<Process::Status: pid 47 exit 1>

失败位置:/pups/lib/pups/exec_command.rb:112:in `spawn'

执行失败,参数为 "/root/upgrade_postgres"

0c74c9de4d4315b63c0ef9055631f38c0cf4b3dd0be6500fd83ca0a5b13e0d9d

** 启动失败 ** 请向上滚动查看更早的错误信息,可能不止一条。

运行 ./discourse-doctor 可能有助于诊断问题。

看起来问题出在 /pups/lib/pups/exec_command.rb 脚本执行 spawn 时。

不过我可以启动应用:
./launcher start app

以下是我 /shared 目录下的文件:

total 48
drwxr-xr-x 12 root      root     4096 Jan 12 13:12 .
drwxr-xr-x 57 root      root     4096 Dec 15 09:09 ..
drwxr-xr-x  3 discourse www-data 4096 Aug 20  2019 backups
drwxr-xr-x  4 root      root     4096 Aug 19  2019 log
drwxr-xr-x  2 postgres  postgres 4096 Aug 19  2019 postgres_backup
drwx------ 20 postgres  postgres 4096 Jan 12 13:14 postgres_data
drwx------ 19 postgres  postgres 4096 Jan 12 13:12 postgres_data_new
drwxrwxr-x  5 postgres  postgres 4096 Jan 12 13:14 postgres_run
drwxr-xr-x  2 redis     redis    4096 Jan 12 13:07 redis_data
drwxr-xr-x  4 root      root     4096 Aug 19  2019 state
drwxr-xr-x  4 discourse www-data 4096 Jan 12 13:14 tmp
drwxr-xr-x  4 discourse www-data 4096 Sep  8  2019 uploads

我该怎么办?

您的共享目录是否是某种网络挂载点?

1 个赞

不,但我终于做到了!

templates/postgres.template.yml 中的两个 mv 命令并未检查是否成功。脚本总是显示 UPGRADE OF POSTGRES COMPLETE,但在移动出错时这并不属实。

我移除了这些命令,然后进入 Docker 后手动移动了目录,接下来的 rebuild app 现在像以前一样正常工作了!我太开心了 :smiley:
谢谢。

3 个赞

在从 PotsgreSQL 13 升级到 15 的过程中,我遇到了这个问题。

在我们的部署中,/shared/postgres_data 目录挂载在一个更快的 NVMe 存储设备上,因此移动它时出现“设备或资源繁忙”的错误消息。

我们通过如下方式修补 postgres 模板来解决此问题:

diff --git a/templates/postgres.template.yml b/templates/postgres.template.yml
index c24bfe6..03813c4 100644
--- a/templates/postgres.template.yml
+++ b/templates/postgres.template.yml
@@ -139,8 +139,10 @@ run:
            exit 1
          fi
 
-         mv /shared/postgres_data /shared/postgres_data_old
-         mv /shared/postgres_data_new /shared/postgres_data
+         mkdir /shared/postgres_data_old
+         mv /shared/postgres_data/* /shared/postgres_data_old
+         mv /shared/postgres_data_new/* /shared/postgres_data
+         rmdir /shared/postgres_data_new

与其尝试操作目录本身,不如移动其中的内容。

请考虑集成此更改以提高数据库升级的可靠性。

1 个赞

我认为这可能与此有关:FIX: improve postgres upgrade reliability by jcharaoui · Pull Request #989 · discourse/discourse_docker · GitHub

2 个赞

PR 已合并 :+1:

1 个赞

此主题在 6 天后自动关闭。不再允许回复。