PostgreSQL 10 到 PostgreSQL 13 的更新失败

继续讨论 PostgreSQL 13 更新

我目前使用多站点配置,包含 2 个容器(数据容器中仅启用了 postgres.10redis 模板)。当前 PostgreSQL 版本为 10,我希望升级到 13。在重建 data 容器时,我遇到了以下错误:

fixing permissions on existing directory /shared/postgres_data_new ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok


Success. You can now start the database server using:

    /usr/lib/postgresql/13/bin/pg_ctl -D /shared/postgres_data_new -l logfile start

Get:1 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Hit:2 http://deb.debian.org/debian buster InRelease
Get:3 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:4 http://apt.postgresql.org/pub/repos/apt buster-pgdg InRelease [104 kB]
Get:5 http://security.debian.org/debian-security buster/updates/main amd64 Packages [291 kB]
Get:6 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 Packages [231 kB]
Hit:7 https://deb.nodesource.com/node_15.x buster InRelease
Fetched 743 kB in 1s (944 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  postgresql-client-10
Suggested packages:
  postgresql-doc-10
The following NEW packages will be installed:
  postgresql-10 postgresql-client-10
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 6,441 kB of archives.
After this operation, 30.6 MB of additional disk space will be used.
Get:1 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-client-10 amd64 10.17-1.pgdg100+1 [1,439 kB]
Get:2 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-10 amd64 10.17-1.pgdg100+1 [5,002 kB]
Fetched 6,441 kB in 0s (34.9 MB/s)
Selecting previously unselected package postgresql-client-10.
(Reading database ... 43021 files and directories currently installed.)
Preparing to unpack .../postgresql-client-10_10.17-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-client-10 (10.17-1.pgdg100+1) ...
Selecting previously unselected package postgresql-10.
Preparing to unpack .../postgresql-10_10.17-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-10 (10.17-1.pgdg100+1) ...
Setting up postgresql-client-10 (10.17-1.pgdg100+1) ...
update-alternatives: warning: forcing reinstallation of alternative /usr/share/postgresql/13/man/man1/psql.1.gz because link group psql.1.gz is broken
Setting up postgresql-10 (10.17-1.pgdg100+1) ...
Creating new PostgreSQL cluster 10/main ...
/usr/lib/postgresql/10/bin/initdb -D /var/lib/postgresql/10/main --auth-local peer --auth-host md5
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/10/main ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Etc/UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctlcluster 10 main start

Ver Cluster Port Status Owner    Data directory              Log file
10  main    5433 down   postgres /var/lib/postgresql/10/main /var/log/postgresql/postgresql-10-main.log
update-alternatives: warning: forcing reinstallation of alternative /usr/share/postgresql/13/man/man1/postmaster.1.gz because link group postmaster.1.gz is broken
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Processing triggers for postgresql-common (226.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
Stopping PostgreSQL 10 database server: main.
Stopping PostgreSQL 13 database server: main.
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok

The source cluster was not shut down cleanly.
Failure, exiting
-------------------------------------------------------------------------------------
UPGRADE OF POSTGRES FAILED

Please visit https://meta.discourse.org/t/postgresql-13-update/172563 for support.

You can run ./launcher start app to restart your app in the meanwhile




FAILED
--------------------
Pups::ExecError: /root/upgrade_postgres failed with return #<Process::Status: pid 49 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "/root/upgrade_postgres"
1f253827e5700e1861c4e586213aaffa8994e452e43b9336301dcd02072e00f4
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

如果我将模板重新设置为 postgres.10,一切都能正常运行。
您知道该如何升级到 PostgreSQL 13 吗?

附注:升级到 PostgreSQL 12 也失败了。

2 个赞

我注意到需要先停止数据容器:

这是我停止两个容器后运行 tail -f shared/data/log/var-log/postgres/current 得到的输出:

2021-06-06 16:38:37.411 UTC [113] HINT:  The server must be started by the user that owns the data directory.
2021-06-06 16:38:38.424 GMT [114] LOG:  skipping missing configuration file "/shared/postgres_data/postgresql.auto.conf"
2021-06-06 16:38:38.424 UTC [114] FATAL:  data directory "/shared/postgres_data" has wrong ownership
2021-06-06 16:38:38.424 UTC [114] HINT:  The server must be started by the user that owns the data directory.
2021-06-06 16:38:39.439 GMT [115] LOG:  skipping missing configuration file "/shared/postgres_data/postgresql.auto.conf"
2021-06-06 16:38:39.439 UTC [115] FATAL:  data directory "/shared/postgres_data" has wrong ownership
2021-06-06 16:38:39.439 UTC [115] HINT:  The server must be started by the user that owns the data directory.
2021-06-06 16:38:40.461 GMT [116] LOG:  skipping missing configuration file "/shared/postgres_data/postgresql.auto.conf"
2021-06-06 16:38:40.461 UTC [116] FATAL:  data directory "/shared/postgres_data" has wrong ownership
2021-06-06 16:38:40.461 UTC [116] HINT:  The server must be started by the user that owns the data directory.

这很可能就是问题的根源,我该如何修复?

3 个赞

出于好奇,你的 postgres_data 目录的拥有者是谁?

2 个赞

在此处

root@forum:/var/discourse/shared/data# ls -al
total 32
drwxr-xr-x  8 root root   4096 Jun  6 16:56 .
drwxr-xr-x  4 root root   4096 Mar  3  2019 ..
drwxr-xr-x  3 root root   4096 Jul  8  2018 log
drwxr-xr-x  2 _apt netdev 4096 Jul  7  2018 postgres_backup
drwx------ 20 _apt netdev 4096 Jun  6 17:29 postgres_data
drwx------ 19 _apt netdev 4096 Jun  6 16:56 postgres_data_new
drwxrwsr-x  5 _apt netdev 4096 Jun  6 17:29 postgres_run
drwxr-xr-x  2 lxd  lxd    4096 Jun  6 17:34 redis_data

这样可以吗?

1 个赞

对于我的配置(2 个容器,PostgreSQL 13),postgres_* 所属的组是 render 而非 netdev,而 postgres_run 组的权限位应为 x,不应设置 setgid。不过,我不太清楚这具体意味着什么:sweat_smile

附:redis_data 的所属用户是 uuidd,所属组是 syslog …

2 个赞

谢谢,我也不知道 :slight_smile:
希望有人能帮忙!

1 个赞

能否请您使用 ls -lan 再次运行相同的命令?这将有助于确定所有权(-n 选项会将用户名转换为 UID,将组名转换为 GID)。以下是我拥有双容器设置的站点示例:

[root@/var/discourse/shared/data]$ ls -lan
total 28
drwxr-xr-x.  7   0   0 4096 Jun  7 22:15 .
drwxr-xr-x.  5   0   0 4096 Jun  7 22:28 ..
drwxr-xr-x.  3   0   0 4096 Jun  7 22:15 log
drwxr-xr-x.  2 105 109 4096 Jun  7 22:15 postgres_backup
drwx------. 19 105 109 4096 Jun  7 22:27 postgres_data
drwxrwxr-x.  3 105 109 4096 Jun  7 22:27 postgres_run
drwxr-xr-x.  2 106 110 4096 Jun  8 00:17 redis_data
1 个赞

好的!内容如下:

root@forum:/var/discourse/shared/data# ls -lan
total 32
drwxr-xr-x  8   0   0 4096 Jun  6 16:56 .
drwxr-xr-x  4   0   0 4096 Mar  3  2019 ..
drwxr-xr-x  3   0   0 4096 Jul  8  2018 log
drwxr-xr-x  2 105 109 4096 Jul  7  2018 postgres_backup
drwx------ 20 105 109 4096 Jun  6 17:29 postgres_data
drwx------ 19 105 109 4096 Jun  6 16:56 postgres_data_new
drwxrwsr-x  5 105 109 4096 Jun  6 17:29 postgres_run
drwxr-xr-x  2 106 110 4096 Jun  9 14:10 redis_data

你好,

我们遇到了相同的情况,但使用的是独立部署。shared/standalone/log/var-log/ 目录下没有任何文件显示,然而进入容器后,我们看到的结果是一样的:

root@pulp-discourse-iptools:/var/www/discourse# tail -n 3 /var/log/postgres/current
2021-10-13 18:33:04.027 GMT [917] LOG:  skipping missing configuration file "/shared/postgres_data/postgresql.auto.conf"
2021-10-13 18:33:04.028 UTC [917] FATAL:  data directory "/shared/postgres_data" has wrong ownership
2021-10-13 18:33:04.028 UTC [917] HINT:  The server must be started by the user that owns the data directory.
root@pulp-discourse-iptools:/var/www/discourse# ls -lan /shared
total 8
drwxr-xr-x 12    0   0  178 Oct 13 18:01 .
drwxr-xr-x 56    0   0  167 May 10  2020 ..
drwxr-xr-x  3 1000  33   21 Apr  3  2019 backups
drwxr-xr-x  4    0   0   34 Apr  3  2019 log
drwxr-xr-x  2  106 110    6 Apr  3  2019 postgres_backup
drwx------ 20  105 109 4096 Oct 13 17:57 postgres_data
drwx------ 19  105 109 4096 Oct 13 18:02 postgres_data_new
drwxrwsr-x  4  105 109   60 Oct 13 17:56 postgres_run
drwxr-xr-x  2  108 111   41 Oct 13 18:26 redis_data
drwxr-xr-x  4    0   0   44 Apr  3  2019 state
drwxr-xr-x  4 1000  33   37 Oct 13 18:26 tmp
drwxr-xr-x  4 1000  33   38 Apr  3  2019 uploads

不过,这看起来仍然很奇怪:

root@pulp-discourse-iptools:/var/www/discourse# ls -alF /shared | grep postgres_
drwxr-xr-x  2 postgres    postgres    6 Apr  3  2019 postgres_backup/
drwx------ 20 Debian-exim ssh      4096 Oct 13 18:53 postgres_data/
drwx------ 19 Debian-exim ssh      4096 Oct 13 18:02 postgres_data_new/
drwxrwsr-x  4 Debian-exim ssh        60 Oct 13 17:56 postgres_run/

为了稳妥起见,我们将 app.yml 中的配置切换回 \"templates/postgres.10.template.yml\",目前的目标仅仅是重新启动旧实例。

或许值得说明的是,我们在切换到 main 分支后,已经处于 https://github.com/discourse/discourse_docker.git 的最新提交版本。不过,由于重建未成功,这可能在启动先前镜像时没有影响,对吗?

如果任何人有解决此问题的建议,我们将不胜感激。否则,我们可能会再次回来提供进一步的报告。

此致,
Andreas。

再次您好,

遵循 @noezDE 的建议(非常感谢!):https://meta.discourse.org/t/update-failed-postgresql/153713/8,我们已能够从该状况中恢复。

现在,网站已重新运行,我们将再次尝试升级。

此致,
Andreas.


恢复步骤

首先,通过执行以下命令取得进展:

chown -R postgres /shared/postgres_data
sv restart postgres

随后,日志显示:

2021-10-13 18:52:12.437 UTC [3357] FATAL: 无法创建锁文件 "/var/run/postgresql/.s.PGSQL.5432.lock":权限被拒绝

因此,我们彻底执行了以下操作:

chown -R postgres:postgres /shared/postgres_*
sv restart postgres

搞定。

5 个赞

再次尝试使用 "templates/postgres.template.yml",整个过程非常顺利,从 PostgreSQL 10 到 PostgreSQL 13 的升级成功完成。再次感谢整个 Discourse 团队在这款软件上的出色工作。

5 个赞