备份失败,由于 PG/SQL 错误

继续讨论 备份到 S3 的命令?

在尝试通过 ./launcher enter 进行备份之后,我发现备份停止工作的原因似乎如下。

pg_dump: 转储表 "topic_links" 的内容失败:PQgetResult() 失败。
pg_dump: 服务器错误消息:ERROR: 无效的内存分配请求大小 18446744073709551613
pg_dump: 执行的命令为:COPY public.topic_links (id, topic_id, post_id, user_id, url, domain, internal, link_topic_id, created_at, updated_at, reflection, clicks, link_post_id, title, crawled_at, quote, extension) TO stdout;
EXCEPTION: pg_dump 失败
/var/www/discourse/lib/backup_restore/backuper.rb:152:in `dump_public_schema'
/var/www/discourse/lib/backup_restore/backuper.rb:36:in `run'
script/discourse:80:in `backup'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/command.rb:27:in `run'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/invocation.rb:127:in `invoke_command'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor.rb:392:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/base.rb:485:in `start'
script/discourse:284:in `<top (required)>'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `load'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `kernel_load'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:28:in `run'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:476:in `exec'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:30:in `dispatch'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/base.rb:476:in `start'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:24:in `start'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/exe/bundle:46:in `block in <top (required)>'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/friendly_errors.rb:123:in `with_friendly_errors'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/exe/bundle:34:in `<top (required)>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'
正在删除旧备份...
正在清理...
移除 '.tar' 残留文件...
标记备份为已完成...
刷新磁盘统计信息...
通知 'system' 备份结束...
完成!
[FAILED]

这尤其令人沮丧,因为据我所知,旧备份也无法使用。尝试恢复这些备份时,我遇到以下错误。

[2020-12-12 01:53:25] COPY 750 [2020-12-12 01:53:30] ERROR: 关系 "topic_users" 中列 "user_id" 的 null 值违反了非空约束 [2020-12-12 01:53:30] DETAIL: 失败的行包含 (null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null)。[2020-12-12 01:53:30] CONTEXT: COPY topic_users,第 623983 行:"\\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N" [2020-12-12 01:53:30] EXCEPTION: psql 失败:CONTEXT: COPY topic_users,第 623983 行:"\\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N \\N" [2020-12-12 01:53:30] /var/www/discourse/lib/backup_restore/database_restorer.rb:87:in `restore_dump' /var/www/discourse/lib/backup_restore/database_restorer.rb:26:in `restore' /var/www/discourse/lib/backup_restore/restorer.rb:51:in `run' /var/www/discourse/script/spawn_backup_restore.rb:23:in `restore' /var/www/discourse/script/spawn_backup_restore.rb:36:in `block in <main>' /var/www/discourse/script/spawn_backup_restore.rb:4:in `fork' /var/www/discourse/script/spawn_backup_restore.rb:4:in `<main>' [2020-12-12 01:53:30] 尝试回滚... [2020-12-12 01:53:30] 正在回滚... [2020-12-12 01:53:30] 正在清理... [2020-12-12 01:53:30] 从 discourse_functions 架构中删除函数... [2020-12-12 01:53:30] 移除临时目录 '/var/www/discourse/tmp/restores/default/2020-12-12-014753'... [2020-12-12 01:53:30] 恢复 sidekiq... [2020-12-12 01:53:30] 标记恢复为已完成...

请问是否有我可以执行的维护操作,使其恢复到能够进行备份或转移的状态?

这看起来是个问题。

这是标准安装吗?这是哪个版本的 PostgreSQL?

你说这台服务器之前出现过一些问题,所以你正尝试将其迁移出去?

当前安装版本为:2.7.0.beta1

该服务器历史较为复杂(已运行约五年),最初是标准自托管安装,随后迁移至 Discourse 托管,之后又迁回。我们一直尽力保持其配置尽可能标准。

我相当确定是最新版本。

该服务器曾出现间歇性中断或卡顿等问题,影响了性能,可能也对数据库造成了影响。上次我进行数据库清理时,大幅减少了 PostgreSQL 文件中的数据量。

目前我能访问的备份文件大小为 1.5 GB,太大,无法用我电脑上现有的任何软件进行编辑。

此外,我还意识到其他一些问题,例如将图片迁移到 S3 失败等,但此前备份等方面并无问题。

我想我会将整个 /var/discourse 目录复制到新服务器,以避开原服务器的问题,然后再逐步排查修复。

您可能遇到了索引损坏的情况,但我现在在手机上,无法完全理解这些错误信息。

Just tried that - took some fussing about.

New system does not like it, pretty much just outright rejects the database.

Launcher is up-to-date
Stopping old container
+ /usr/bin/docker stop -t 60 app
app
cd /pups && git pull && /pups/bin/pups --stdin
Already up to date.
I, [2020-12-13T09:23:39.291334 #1]  INFO -- : Loading --stdin
I, [2020-12-13T09:23:39.296303 #1]  INFO -- : > DEBIAN_FRONTEND=noninteractive apt-get purge -y postgresql-13 postgresql-client-13 postgresql-contrib-13
I, [2020-12-13T09:23:41.511661 #1]  INFO -- : Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
  libllvm7 pgdg-keyring postgresql-client-common postgresql-common ssl-cert
Use 'apt autoremove' to remove them.
The following packages will be REMOVED:
  postgresql-13* postgresql-client-13*
0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.
After this operation, 54.3 MB disk space will be freed.
(Reading database ... 43863 files and directories currently installed.)
Removing postgresql-13 (13.1-1.pgdg100+1) ...
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of stop.
Removing postgresql-client-13 (13.1-1.pgdg100+1) ...
Processing triggers for postgresql-common (223.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
(Reading database ... 42050 files and directories currently installed.)
Purging configuration files for postgresql-13 (13.1-1.pgdg100+1) ...
Dropping cluster main...

I, [2020-12-13T09:23:41.511861 #1]  INFO -- : > apt-get update && apt-get install -y postgresql-10 postgresql-client-10 postgresql-contrib-10
debconf: delaying package configuration, since apt-utils is not installed
I, [2020-12-13T09:23:51.192217 #1]  INFO -- : Hit:1 http://deb.debian.org/debian buster InRelease
Get:2 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:3 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:4 http://apt.postgresql.org/pub/repos/apt buster-pgdg InRelease [104 kB]
Hit:5 https://deb.nodesource.com/node_10.x buster InRelease
Get:6 http://security.debian.org/debian-security buster/updates/main amd64 Packages [254 kB]
Get:7 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 Packages [216 kB]
Fetched 690 kB in 1s (525 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following package was automatically installed and is no longer required:
  libllvm7
Use 'apt autoremove' to remove it.
Suggested packages:
  postgresql-doc-10
The following NEW packages will be installed:
  postgresql-10 postgresql-client-10
0 upgraded, 2 newly installed, 0 to remove and 5 not upgraded.
Need to get 6,402 kB of archives.
After this operation, 30.6 MB of additional disk space will be used.
Get:1 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-client-10 amd64 10.15-1.pgdg100+1 [1,436 kB]
Get:2 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-10 amd64 10.15-1.pgdg100+1 [4,966 kB]
Fetched 6,402 kB in 2s (2,809 kB/s)
Selecting previously unselected package postgresql-client-10.
(Reading database ... 42050 files and directories currently installed.)
Preparing to unpack .../postgresql-client-10_10.15-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-client-10 (10.15-1.pgdg100+1) ...
Selecting previously unselected package postgresql-10.
Preparing to unpack .../postgresql-10_10.15-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-10 (10.15-1.pgdg100+1) ...
Setting up postgresql-client-10 (10.15-1.pgdg100+1) ...
update-alternatives: using /usr/share/postgresql/10/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-10 (10.15-1.pgdg100+1) ...
Creating new PostgreSQL cluster 10/main ...
/usr/lib/postgresql/10/bin/initdb -D /var/lib/postgresql/10/main --auth-local peer --auth-host md5
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/10/main ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Etc/UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctlcluster 10 main start

Ver Cluster Port Status Owner    Data directory              Log file
10  main    5432 down   postgres /var/lib/postgresql/10/main /var/log/postgresql/postgresql-10-main.log
update-alternatives: using /usr/share/postgresql/10/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Processing triggers for postgresql-common (223.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:

I, [2020-12-13T09:23:51.192964 #1]  INFO -- : > mkdir -p /shared/postgres_run
I, [2020-12-13T09:23:51.195917 #1]  INFO -- :
I, [2020-12-13T09:23:51.196235 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2020-12-13T09:23:51.198835 #1]  INFO -- :
I, [2020-12-13T09:23:51.199139 #1]  INFO -- : > chmod 775 /shared/postgres_run
I, [2020-12-13T09:23:51.201681 #1]  INFO -- :
I, [2020-12-13T09:23:51.202025 #1]  INFO -- : > rm -fr /var/run/postgresql
I, [2020-12-13T09:23:51.204199 #1]  INFO -- :
I, [2020-12-13T09:23:51.204549 #1]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2020-12-13T09:23:51.207718 #1]  INFO -- :
I, [2020-12-13T09:23:51.208017 #1]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2020/12/13 09:23:51 socat[1567] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): No such file or directory
I, [2020-12-13T09:23:51.217014 #1]  INFO -- :
I, [2020-12-13T09:23:51.217294 #1]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2020-12-13T09:23:51.220400 #1]  INFO -- :
I, [2020-12-13T09:23:51.220682 #1]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2020-12-13T09:23:51.223488 #1]  INFO -- :
I, [2020-12-13T09:23:51.223691 #1]  INFO -- : > mkdir -p /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-12-13T09:23:51.225967 #1]  INFO -- :
I, [2020-12-13T09:23:51.226198 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-12-13T09:23:51.228306 #1]  INFO -- :
I, [2020-12-13T09:23:51.233016 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2020-12-13T09:23:51.237345 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2020-12-13T09:23:51.237662 #1]  INFO -- : > chown -R root /var/lib/postgresql/10/main
I, [2020-12-13T09:23:51.244979 #1]  INFO -- :
I, [2020-12-13T09:23:51.245164 #1]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o postgres -g postgres /shared/postgres_data && sudo -E -u postgres /usr/lib/postgresql/10/bin/initdb -D /shared/postgres_data || exit 0
I, [2020-12-13T09:23:51.246982 #1]  INFO -- :
I, [2020-12-13T09:23:51.247152 #1]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2020-12-13T09:23:51.314470 #1]  INFO -- :
I, [2020-12-13T09:23:51.314888 #1]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2020-12-13T09:23:51.318075 #1]  INFO -- :
I, [2020-12-13T09:23:51.318499 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/10/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.319171 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.319652 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.320131 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.320672 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.321143 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.321608 #1]  INFO -- : > install -d -m 0755 -o postgres -g postgres /shared/postgres_backup
I, [2020-12-13T09:23:51.324709 #1]  INFO -- :
I, [2020-12-13T09:23:51.325108 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.325597 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.326097 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.326619 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres  peer in /etc/postgresql/10/main/pg_hba.conf
I, [2020-12-13T09:23:51.327039 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/10/main/pg_hba.conf
I, [2020-12-13T09:23:51.327456 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main
I, [2020-12-13T09:23:51.329156 #1]  INFO -- : > sleep 5
2020-12-13 09:23:51.347 UTC [1583] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-12-13 09:23:51.347 UTC [1583] LOG:  listening on IPv6 address "::", port 5432
2020-12-13 09:23:51.349 UTC [1583] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-12-13 09:23:51.363 UTC [1583] FATAL:  database files are incompatible with server
2020-12-13 09:23:51.363 UTC [1583] DETAIL:  The database cluster was initialized with PG_CONTROL_VERSION 1300, but the server was compiled with PG_CONTROL_VERSION 1002.
2020-12-13 09:23:51.363 UTC [1583] HINT:  It looks like you need to initdb.
2020-12-13 09:23:51.365 UTC [1583] LOG:  database system is shut down
I, [2020-12-13T09:23:56.331811 #1]  INFO -- :
I, [2020-12-13T09:23:56.332043 #1]  INFO -- : > su postgres -c 'createdb discourse' || true
createdb: could not connect to database template1: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.394383 #1]  INFO -- :
I, [2020-12-13T09:23:56.394680 #1]  INFO -- : > su postgres -c 'psql discourse -c "create user discourse;"' || true
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.454155 #1]  INFO -- :
I, [2020-12-13T09:23:56.454333 #1]  INFO -- : > su postgres -c 'psql discourse -c "grant all privileges on database discourse to discourse;"' || true
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.508933 #1]  INFO -- :
I, [2020-12-13T09:23:56.509118 #1]  INFO -- : > su postgres -c 'psql discourse -c "alter schema public owner to discourse;"'
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.560843 #1]  INFO -- :
I, [2020-12-13T09:23:56.561176 #1]  INFO -- : Terminating async processes


FAILED
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' failed with return #<Process::Status: pid 1609 exit 2>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "su postgres -c 'psql $db_name -c \"alter schema public owner to $db_user;\"'"
da620ae9048b2cda99c7a0d24e38c9dfafba5d61fac8c64c2da2362a19338a76
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

This bit seems a concern to me:

Using Discourse Doctor on both doesn’t seem to have any solutions.

它正在尝试升级 PostgreSQL,这很大程度上是预期的。

也许它会重试,并按照 PostgreSQL 13 更新 中的描述切换到 pg 10 模板。

阻止升级的这行代码仍然在 YML 文件中。

我可以进入应用并访问旧服务器上的 SQL,但在新服务器上尝试时返回:

psql: 错误:无法连接到服务器:没有那个文件或目录
        服务器是否在本地运行,并正在 Unix 域套接字 "/var/run/postgresql/.s.PGSQL.5432" 上接受连接?

上面这一行似乎是根本问题,@wincenworks

我无法告诉你具体该怎么做,但如果我是你(@wincenworks),我会从头安装 Discourse;并且在构建容器之前,将你的模板设置为使用 PG10。

之后,当你让新实例正常运行起来后,就可以在容器内的命令行(而不是通过 UI)尝试从当前的 PG10 备份恢复你的 Discourse 实例。

希望这能帮到你。

正尝试创建一个运行 PG10 的新实例,但在注册过程的最后阶段一直超时。

我当然很想这么做,但是:

  1. 系统不允许我创建新的备份
  2. 之前的备份无法恢复

因此,我正尝试通过 scp 将它们传输过去。

是的,正如我所理解的那样,在当前配置下它无法恢复。

你试过在完全全新安装后使用该备份进行恢复吗?正如我之前提到的。

首先,从头开始安装 Discourse,确保它在 PG10 安装模板下完全正常运行。

然后,使用你的最新备份进行恢复(通过命令行,而不是 UI)。