从 2.7.0.beta1 升级到 2.7.0.beta3 失败

我使用一键浏览器升级功能尝试从 2.7.0.beta1 升级到 2.7.0.beta3。

首先,它更新了 Docker,看起来成功了。然后,按照说明,我在服务器上执行了以下命令:

    cd /var/discourse
    git pull
    ./launcher rebuild app

该过程完成后提示需要再次重建。于是我照做了,但在进行到较后阶段时出现了以下错误:

I, [2021-02-01T04:03:23.848858 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main
I, [2021-02-01T04:03:23.850125 #1]  INFO -- : > sleep 5
I, [2021-02-01T04:03:28.854186 #1]  INFO -- :
I, [2021-02-01T04:03:28.854378 #1]  INFO -- : > su postgres -c 'createdb discourse' || true
createdb: 错误:无法连接到数据库 template1:无法连接到服务器:没有这样的文件或目录
        服务器是否在本地运行,并正在监听
        Unix 域套接字 "/var/run/postgresql/.s.PGSQL.5432" 上的连接?
I, [2021-02-01T04:03:28.940422 #1]  INFO -- :
I, [2021-02-01T04:03:28.940926 #1]  INFO -- : > su postgres -c 'psql discourse -c "create user discourse;"' || true
psql: 错误:无法连接到服务器:没有这样的文件或目录
        服务器是否在本地运行,并正在监听
        Unix 域套接字 "/var/run/postgresql/.s.PGSQL.5432" 上的连接?
I, [2021-02-01T04:03:29.005802 #1]  INFO -- :
I, [2021-02-01T04:03:29.006192 #1]  INFO -- : > su postgres -c 'psql discourse -c "grant all privileges on database discourse to discourse;"' || true
psql: 错误:无法连接到服务器:没有这样的文件或目录
        服务器是否在本地运行,并正在监听
        Unix 域套接字 "/var/run/postgresql/.s.PGSQL.5432" 上的连接?
I, [2021-02-01T04:03:29.055155 #1]  INFO -- :
I, [2021-02-01T04:03:29.055530 #1]  INFO -- : > su postgres -c 'psql discourse -c "alter schema public owner to discourse;"'
psql: 错误:无法连接到服务器:没有这样的文件或目录
        服务器是否在本地运行,并正在监听
        Unix 域套接字 "/var/run/postgresql/.s.PGSQL.5432" 上的连接?
I, [2021-02-01T04:03:29.102737 #1]  INFO -- :
I, [2021-02-01T04:03:29.103136 #1]  INFO -- : 终止异步进程
I, [2021-02-01T04:03:29.103280 #1]  INFO -- : 向 HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main 发送 INT 信号,进程 ID: 52


失败
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' 执行失败,返回状态码 #<Process::Status: pid 78 exit 2>
失败位置:/pups/lib/pups/exec_command.rb:112:in `spawn'
执行失败,参数为 "su postgres -c 'psql $db_name -c \\\"alter schema public owner to $db_user;\\\"'"
74718f22e5eb9e1ceb21ac2a2fe613d13aee282a353cf60b91258ba2b2323397
** 引导失败 ** 请向上滚动查看更早的错误信息,可能不止一条。
./discourse-doctor 可能有助于诊断问题。

发布说明中提到了 PostgreSQL 和磁盘空间的问题;也许失败是由于这个原因?当我运行 discourse-doctor 时,输出包括:

---------- 操作系统磁盘空间 ----------
文件系统                 总容量  已用  可用  使用率  挂载点
/dev/disk/by-label/DOROOT   30G   20G  8.5G  70%  /

我现在该怎么办?

您能粘贴更多重建输出的行吗?我们需要更早的一些行,以便能够进行故障排查。

看起来罪魁祸首可能是:

    I, [2021-02-01T22:13:56.638190 #1]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate'
2021-02-01 22:14:05.011 UTC [4123] discourse@discourse ERROR:  duplicate key value violates unique constraint "index_users_on_username"
2021-02-01 22:14:05.011 UTC [4123] discourse@discourse DETAIL:  Key (username)=(Pxxx_Gxxxxxxxx) already exists.
2021-02-01 22:14:05.011 UTC [4123] discourse@discourse STATEMENT:  UPDATE users
        SET locale = 'en'
        WHERE locale = 'en_US'

rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "index_users_on_username"
DETAIL:  Key (username)=(Pxxx_Gxxxxxxxx) already exists.

如果是这样,您建议如何解决这个问题?完整输出已附上。

failed_discourse_upgrade_2021_01_31.txt (90.4 KB)

最近有几个关于修复重复项的主题。例如:

谢谢。我在运行 psql 时遇到了问题。

我重启了容器:

./launcher start app

然后执行:

./launcher enter app

接着:

su postgres -c 'psql discourse'

但输出如下:

psql: 错误:无法连接到服务器:无法连接到服务器:没有那个文件或目录
服务器是否在本地运行,并且正在
监听 Unix 域套接字 “/var/run/postgresql/.s.PGSQL.5432” 上的连接?

尝试运行 sudo -u postgres psql discourse

尝试运行 sudo -u postgres psql discourse

很遗憾,错误信息相同。

启动器重建的第一次迭代是否成功运行?

如果是的话,说明你的数据库已经升级到了 PostgreSQL 13,而你旧应用镜像中的 PostgreSQL 二进制文件仍然期望数据库版本为 10 或 12,具体取决于你从哪个版本升级。

cd /var/discourse/shared/standalone
ls -alh 

你是否有 postgres_datapostgres_data_old 这两个目录?

确保应用已停止,然后移动或重命名(为了安全起见,暂时不要删除)postgres_data 目录,接着执行:

mv postgres_data_old postgres_data

然后再次尝试 launcher start app

希望这能帮到你!

Gunnar

我按照这些步骤成功重命名了重复的用户。但在尝试重新索引时,我遇到了这个问题

@sam 在那个帖子中建议:“是的,请删除重复的行”,但我不清楚在重新索引时遇到的警告和错误信息的语境下该怎么做。现在问题已经演变成:

REINDEX SCHEMA CONCURRENTLY public;
WARNING:  cannot reindex invalid index "public.index_incoming_referers_on_path_and_incoming_domain_id_ccnew" concurrently, skipping
WARNING:  cannot reindex invalid index "public.incoming_referers_pkey_ccnew1" concurrently, skipping
WARNING:  cannot reindex invalid index "public.index_incoming_referers_on_path_and_incoming_domain_id_ccnew1" concurrently, skipping
WARNING:  cannot reindex invalid index "public.index_incoming_referers_on_path_and_incoming_domain_id_cc_ccnew" concurrently, skipping
WARNING:  cannot reindex invalid index "public.index_incoming_referers_on_path_and_incoming_domain_id_c_ccnew1" concurrently, skipping
WARNING:  cannot reindex invalid index "public.incoming_referers_pkey_ccnew2" concurrently, skipping
WARNING:  cannot reindex invalid index "public.index_incoming_referers_on_path_and_incoming_domain_id_ccnew2" concurrently, skipping
WARNING:  cannot reindex invalid index "public.incoming_referers_pkey_ccnew_ccnew" concurrently, skipping
WARNING:  cannot reindex invalid index "pg_toast.pg_toast_19250_index_ccnew1" concurrently, skipping
WARNING:  cannot reindex invalid index "pg_toast.pg_toast_19250_index_ccnew2" concurrently, skipping
WARNING:  cannot reindex invalid index "pg_toast.pg_toast_19250_index_ccnew_ccnew" concurrently, skipping
ERROR:  could not create unique index "index_incoming_referers_on_path_and_incoming_domain_id_ccnew3"
DETAIL:  Key (path, incoming_domain_id)=(/votes/, 1165) is duplicated.

这些是在重新索引过程中创建的临时索引。每次因重复项导致崩溃时,都会至少留下一个临时索引。你可以通过名称识别它们,这些名称以 ccnew、ccnew1、ccnew2 等结尾。

你可以通过进入 Psql 并执行 DROP INDEX 命令来删除它们。

sudo ./launcher enter app
su postgres -c 'psql discourse'

DROP INDEX '<索引名称>_ccnew';
DROP INDEX '<索引名称>_ccnew1';

以此类推。请务必先备份数据库,确保重新索引当前未运行,并且只删除以 _ccnew 结尾的索引。

更多信息请参阅此帖子:

Thanks again, Gunnar. I was able to drop some of the ccnew indexes but not all:

discourse=# DROP INDEX public.index_incoming_referers_on_path_and_incoming_domain_id_ccnew;
DROP INDEX
discourse=# DROP INDEX public.incoming_referers_pkey_ccnew1;
DROP INDEX
discourse=# DROP INDEX public.index_incoming_referers_on_path_and_incoming_domain_id_ccnew1;
DROP INDEX
discourse=# DROP INDEX public.index_incoming_referers_on_path_and_incoming_domain_id_cc_ccnew;
ERROR:  index "index_incoming_referers_on_path_and_incoming_domain_id_cc_ccnew" does not exist
discourse=# DROP INDEX public.index_incoming_referers_on_path_and_incoming_domain_id_c_ccnew1;
ERROR:  index "index_incoming_referers_on_path_and_incoming_domain_id_c_ccnew1" does not exist
discourse=# DROP INDEX public.incoming_referers_pkey_ccnew2;
DROP INDEX
discourse=# DROP INDEX public.incoming_referers_pkey_ccnew_ccnew;
ERROR:  index "incoming_referers_pkey_ccnew_ccnew" does not exist
discourse=# DROP INDEX pg_toast.pg_toast_19250_index_ccnew1;
ERROR:  permission denied: "pg_toast_19250_index_ccnew1" is a system catalog
discourse=# DROP INDEX pg_toast.pg_toast_19250_index_ccnew2;
ERROR:  permission denied: "pg_toast_19250_index_ccnew2" is a system catalog
discourse=# DROP INDEX pg_toast.pg_toast_19250_index_ccnew_ccnew;
ERROR:  permission denied: "pg_toast_19250_index_ccnew_ccnew" is a system catalog

In any case, I seem to have successfully reindexed afterwards:

discourse=# REINDEX SCHEMA CONCURRENTLY public;
REINDEX

So then I went to complete the upgrade, but it failed:

root@forum:/var/discourse# ./launcher rebuild app
Ensuring launcher is up to date
Fetching origin
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/discourse/discourse_docker
 * [new branch]      fix-prune-time -> origin/fix-prune-time
Launcher is up-to-date
Stopping old container
+ /usr/bin/docker stop -t 60 app
app
cd /pups && git pull && /pups/bin/pups --stdin
Already up to date.
I, [2021-02-15T00:34:30.967636 #1]  INFO -- : Loading --stdin
I, [2021-02-15T00:34:30.973572 #1]  INFO -- : > locale-gen $LANG && update-locale
I, [2021-02-15T00:34:31.024271 #1]  INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2021-02-15T00:34:31.024803 #1]  INFO -- : > mkdir -p /shared/postgres_run
I, [2021-02-15T00:34:31.029795 #1]  INFO -- :
I, [2021-02-15T00:34:31.030826 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2021-02-15T00:34:31.033498 #1]  INFO -- :
I, [2021-02-15T00:34:31.033875 #1]  INFO -- : > chmod 775 /shared/postgres_run
I, [2021-02-15T00:34:31.036104 #1]  INFO -- :
I, [2021-02-15T00:34:31.036435 #1]  INFO -- : > rm -fr /var/run/postgresql
I, [2021-02-15T00:34:31.038583 #1]  INFO -- :
I, [2021-02-15T00:34:31.038915 #1]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2021-02-15T00:34:31.041198 #1]  INFO -- :
I, [2021-02-15T00:34:31.041511 #1]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2021/02/15 00:34:31 socat[27] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): No such file or directory
I, [2021-02-15T00:34:31.055279 #1]  INFO -- :
I, [2021-02-15T00:34:31.055620 #1]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2021-02-15T00:34:31.058156 #1]  INFO -- :
I, [2021-02-15T00:34:31.058442 #1]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2021-02-15T00:34:31.060461 #1]  INFO -- :
I, [2021-02-15T00:34:31.060758 #1]  INFO -- : > mkdir -p /shared/postgres_run/13-main.pg_stat_tmp
I, [2021-02-15T00:34:31.062949 #1]  INFO -- :
I, [2021-02-15T00:34:31.063384 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run/13-main.pg_stat_tmp
I, [2021-02-15T00:34:31.065117 #1]  INFO -- :
I, [2021-02-15T00:34:31.069700 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2021-02-15T00:34:31.073080 #1]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x  chown:
I, [2021-02-15T00:34:31.076629 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2021-02-15T00:34:31.079978 #1]  INFO -- : File > /root/upgrade_postgres  chmod: +x  chown:
I, [2021-02-15T00:34:31.080365 #1]  INFO -- : > chown -R root /var/lib/postgresql/13/main
I, [2021-02-15T00:34:31.456272 #1]  INFO -- :
I, [2021-02-15T00:34:31.456523 #1]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o postgres -g postgres /shared/postgres_data && sudo -E -u postgres /usr/lib/postgresql/13/bin/initdb -D /shared/postgres_data || exit 0
I, [2021-02-15T00:34:31.458416 #1]  INFO -- :
I, [2021-02-15T00:34:31.458635 #1]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2021-02-15T00:34:31.489118 #1]  INFO -- :
I, [2021-02-15T00:34:31.489681 #1]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2021-02-15T00:34:31.491900 #1]  INFO -- :
I, [2021-02-15T00:34:31.492294 #1]  INFO -- : > /root/upgrade_postgres
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
debconf: delaying package configuration, since apt-utils is not installed
I, [2021-02-15T00:34:44.948743 #1]  INFO -- : Upgrading PostgreSQL from version 12 to 13
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /shared/postgres_data_new ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok


Success. You can now start the database server using:

    /usr/lib/postgresql/13/bin/pg_ctl -D /shared/postgres_data_new -l logfile start

Get:1 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:2 http://deb.debian.org/debian buster InRelease [122 kB]
Get:3 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:4 http://apt.postgresql.org/pub/repos/apt buster-pgdg InRelease [104 kB]
Get:5 http://security.debian.org/debian-security buster/updates/main amd64 Packages [267 kB]
Get:6 http://deb.debian.org/debian buster/main amd64 Packages [7,907 kB]
Get:7 http://deb.debian.org/debian buster-updates/main amd64 Packages.diff/Index [5,656 B]
Get:8 http://deb.debian.org/debian buster-updates/main amd64 Packages 2020-12-24-1401.30.pdiff [286 B]
Get:9 http://deb.debian.org/debian buster-updates/main amd64 Packages 2021-01-29-2000.47.pdiff [408 B]
Get:10 http://deb.debian.org/debian buster-updates/main amd64 Packages 2021-02-07-1359.56.pdiff [2,302 B]
Get:10 http://deb.debian.org/debian buster-updates/main amd64 Packages 2021-02-07-1359.56.pdiff [2,302 B]
Get:11 https://deb.nodesource.com/node_10.x buster InRelease [4,584 B]
Get:12 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 Packages [216 kB]
Get:13 https://deb.nodesource.com/node_10.x buster/main amd64 Packages [768 B]
Fetched 8,746 kB in 2s (4,421 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  postgresql-client-12
Suggested packages:
  postgresql-doc-12
The following NEW packages will be installed:
  postgresql-12 postgresql-client-12
0 upgraded, 2 newly installed, 0 to remove and 28 not upgraded.
Need to get 16.1 MB of archives.
After this operation, 54.1 MB of additional disk space will be used.
Get:1 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-client-12 amd64 12.6-1.pgdg100+1 [1,424 kB]
Get:2 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-12 amd64 12.6-1.pgdg100+1 [14.7 MB]
Fetched 16.1 MB in 1s (12.8 MB/s)
Selecting previously unselected package postgresql-client-12.
(Reading database ... 43899 files and directories currently installed.)
Preparing to unpack .../postgresql-client-12_12.6-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-client-12 (12.6-1.pgdg100+1) ...
Selecting previously unselected package postgresql-12.
Preparing to unpack .../postgresql-12_12.6-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-12 (12.6-1.pgdg100+1) ...
Setting up postgresql-client-12 (12.6-1.pgdg100+1) ...
update-alternatives: warning: forcing reinstallation of alternative /usr/share/postgresql/13/man/man1/psql.1.gz because link group psql.1.gz is broken
Setting up postgresql-12 (12.6-1.pgdg100+1) ...
Creating new PostgreSQL cluster 12/main ...
/usr/lib/postgresql/12/bin/initdb -D /var/lib/postgresql/12/main --auth-local peer --auth-host md5
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/12/main ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctlcluster 12 main start

Ver Cluster Port Status Owner    Data directory              Log file
12  main    5433 down   postgres /var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log
update-alternatives: warning: forcing reinstallation of alternative /usr/share/postgresql/13/man/man1/postmaster.1.gz because link group postmaster.1.gz is broken
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Processing triggers for postgresql-common (223.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
Stopping PostgreSQL 12 database server: main.
Stopping PostgreSQL 13 database server: main.
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Creating dump of global objects                             ok
Creating dump of database schemas
  discourse

*failure*

Consult the last few lines of "pg_upgrade_dump_16566.log" for
the probable cause of the failure.
Failure, exiting
-------------------------------------------------------------------------------------
UPGRADE OF POSTGRES FAILED

Please visit https://meta.discourse.org/t/postgresql-13-update/172563 for support.

You can run ./launcher start app to restart your app in the meanwhile




FAILED
--------------------
Pups::ExecError: /root/upgrade_postgres failed with return #<Process::Status: pid 46 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "/root/upgrade_postgres"
1b91e47c88940d6c697c346fa8db3d4ab39bbc83f1340dc6f734ca0f9abe6eeb
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

I don’t know why the rebuild failed, except that it appears to have something to do with Postgres. I don’t know where the log file “pg_upgrade_dump_16566.log” is supposed to be.

Ideas?

我认为这意味着升级未成功。我昨天也遇到过类似的情况。

我当时的做法是将备份数据移回 postgres_data,然后切换到 pg10(对您来说可能是 pg12?)模板,接着重新构建,再将模板改回原样并重新构建两次以完成升级。

这只是大致步骤,毕竟我是在手机上操作的。PostgreSQL 13 更新 页面应包含您所需的所有信息。

我想我做的操作是:先将备份数据移回 postgres_data,然后切换到 pg10(对你来说可能是 pg12?)模板,接着重建,最后将模板改回原样并再次重建两次以完成升级。

所以让我确认一下我是否理解正确。我需要:

  1. 将 postgres_data 重命名为其他名称,然后将我制作的备份重命名为 postgres_data。
  2. 将 postgres.template.yml 重命名为其他名称,然后将 postgres.12.template.yml 重命名为 postgres.template.yml。
  3. 执行 ./launcher rebuild app。
  4. 恢复 postgres_data 和 postgres.template.yml。
  5. 执行 ./launcher rebuild app。
  6. 执行 ./launcher rebuild app。

是这样吗?

嘿,Roger。情况应该非常类似,但我并不完全了解你的具体处境,所以无法打包票。

哦,等等。

不对。在 app.yml 中,你应该引用 postgres 12 模板,而不是常规的 postgres 模板。你是在编辑自己的 app.yml,而不是重命名任何文件。升级指南中对此解释得相当清楚,我想。

对于我刚刚修复的那个站点,postgres_data 目录是空的,然后(我想)我的脚本执行了 docker prune,删除了那个容器。我想如果我当时只是重启它,它本可以正常工作的。

如果你只是想修复它并且有预算,请访问 https://www.literatecomputing.com/automatic-rebuilds-when-they-are-needed/。