Backup failed due to PG/SQL errors

Continuing the discussion from Backup to S3 command?:

Further to attempting to backup via ./launcher enter I have since found what seems to be the reason that the backups have ceased working.

pg_dump: Dumping the contents of table "topic_links" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR:  invalid memory alloc request size 18446744073709551613
pg_dump: The command was: COPY public.topic_links (id, topic_id, post_id, user_id, url, domain, internal, link_topic_id, created_at, updated_at, reflection, clicks, link_post_id, title, crawled_at, quote, extension) TO stdout;
EXCEPTION: pg_dump failed
/var/www/discourse/lib/backup_restore/backuper.rb:152:in `dump_public_schema'
/var/www/discourse/lib/backup_restore/backuper.rb:36:in `run'
script/discourse:80:in `backup'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/command.rb:27:in `run'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/invocation.rb:127:in `invoke_command'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor.rb:392:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/base.rb:485:in `start'
script/discourse:284:in `<top (required)>'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `load'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `kernel_load'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:28:in `run'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:476:in `exec'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:30:in `dispatch'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/base.rb:476:in `start'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:24:in `start'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/exe/bundle:46:in `block in <top (required)>'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/friendly_errors.rb:123:in `with_friendly_errors'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/exe/bundle:34:in `<top (required)>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'
Deleting old backups...
Cleaning stuff up...
Removing '.tar' leftovers...
Marking backup as finished...
Refreshing disk stats...
Notifying 'system' of the end of the backup...
Finished!
[FAILED]

This is particularly frustrating since apparently I cannot use the old backups either, when attempting to restore those I get this error.

[2020-12-12 01:53:25] COPY 750 [2020-12-12 01:53:30] ERROR: null value in column "user_id" of relation "topic_users" violates not-null constraint [2020-12-12 01:53:30] DETAIL: Failing row contains (null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null). [2020-12-12 01:53:30] CONTEXT: COPY topic_users, line 623983: "\N \N \N \N \N \N \N \N \N \N \N \N \N \N \N \N" [2020-12-12 01:53:30] EXCEPTION: psql failed: CONTEXT: COPY topic_users, line 623983: "\N \N \N \N \N \N \N \N \N \N \N \N \N \N \N \N" [2020-12-12 01:53:30] /var/www/discourse/lib/backup_restore/database_restorer.rb:87:in `restore_dump' /var/www/discourse/lib/backup_restore/database_restorer.rb:26:in `restore' /var/www/discourse/lib/backup_restore/restorer.rb:51:in `run' /var/www/discourse/script/spawn_backup_restore.rb:23:in `restore' /var/www/discourse/script/spawn_backup_restore.rb:36:in `block in <main>' /var/www/discourse/script/spawn_backup_restore.rb:4:in `fork' /var/www/discourse/script/spawn_backup_restore.rb:4:in `<main>' [2020-12-12 01:53:30] Trying to rollback... [2020-12-12 01:53:30] Rolling back... [2020-12-12 01:53:30] Cleaning stuff up... [2020-12-12 01:53:30] Dropping functions from the discourse_functions schema... [2020-12-12 01:53:30] Removing tmp '/var/www/discourse/tmp/restores/default/2020-12-12-014753' directory... [2020-12-12 01:53:30] Unpausing sidekiq... [2020-12-12 01:53:30] Marking restore as finished...

Don’t suppose there’s any maintenance I could to get it back to the stage where it can be backed up/transferred?

That looks like a problem.

Is this a standard install? What version of postgres is this?

You said that there was some problem with this server and that’s why you’re trying to move from it?

Current install is: 2.7.0.beta1

The server has a complicated history (it’s about five years old) and was a standard install self-hosted, then transferred to Discourse hosting, then transferred back - we’ve tried to keep it as standard as possible.

I am pretty sure it’s the current version.

The server has had intermittent outages/stalls etc which have been effecting performance and may have been impacting the database. List time I did a database clean-up it shaved a lot of data out of the postgres file.

Currently the backup file I do have access to is 1.5 gig, so has been too large to edit with any software I have on my PC currently.

There are other issues I am aware of where trying to migrate images to S3 has failed etc, but previously there were no issues with backups etc.

I think I would copy the entire /var/discourse to a new server to get away from whatever is wrong with that server and then try to get things straight.

You may have a corrupt index, but I’m in my phone and can’t quite make sense of the errors.

Just tried that - took some fussing about.

New system does not like it, pretty much just outright rejects the database.

Launcher is up-to-date
Stopping old container
+ /usr/bin/docker stop -t 60 app
app
cd /pups && git pull && /pups/bin/pups --stdin
Already up to date.
I, [2020-12-13T09:23:39.291334 #1]  INFO -- : Loading --stdin
I, [2020-12-13T09:23:39.296303 #1]  INFO -- : > DEBIAN_FRONTEND=noninteractive apt-get purge -y postgresql-13 postgresql-client-13 postgresql-contrib-13
I, [2020-12-13T09:23:41.511661 #1]  INFO -- : Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
  libllvm7 pgdg-keyring postgresql-client-common postgresql-common ssl-cert
Use 'apt autoremove' to remove them.
The following packages will be REMOVED:
  postgresql-13* postgresql-client-13*
0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.
After this operation, 54.3 MB disk space will be freed.
(Reading database ... 43863 files and directories currently installed.)
Removing postgresql-13 (13.1-1.pgdg100+1) ...
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of stop.
Removing postgresql-client-13 (13.1-1.pgdg100+1) ...
Processing triggers for postgresql-common (223.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
(Reading database ... 42050 files and directories currently installed.)
Purging configuration files for postgresql-13 (13.1-1.pgdg100+1) ...
Dropping cluster main...

I, [2020-12-13T09:23:41.511861 #1]  INFO -- : > apt-get update && apt-get install -y postgresql-10 postgresql-client-10 postgresql-contrib-10
debconf: delaying package configuration, since apt-utils is not installed
I, [2020-12-13T09:23:51.192217 #1]  INFO -- : Hit:1 http://deb.debian.org/debian buster InRelease
Get:2 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:3 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:4 http://apt.postgresql.org/pub/repos/apt buster-pgdg InRelease [104 kB]
Hit:5 https://deb.nodesource.com/node_10.x buster InRelease
Get:6 http://security.debian.org/debian-security buster/updates/main amd64 Packages [254 kB]
Get:7 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 Packages [216 kB]
Fetched 690 kB in 1s (525 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following package was automatically installed and is no longer required:
  libllvm7
Use 'apt autoremove' to remove it.
Suggested packages:
  postgresql-doc-10
The following NEW packages will be installed:
  postgresql-10 postgresql-client-10
0 upgraded, 2 newly installed, 0 to remove and 5 not upgraded.
Need to get 6,402 kB of archives.
After this operation, 30.6 MB of additional disk space will be used.
Get:1 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-client-10 amd64 10.15-1.pgdg100+1 [1,436 kB]
Get:2 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-10 amd64 10.15-1.pgdg100+1 [4,966 kB]
Fetched 6,402 kB in 2s (2,809 kB/s)
Selecting previously unselected package postgresql-client-10.
(Reading database ... 42050 files and directories currently installed.)
Preparing to unpack .../postgresql-client-10_10.15-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-client-10 (10.15-1.pgdg100+1) ...
Selecting previously unselected package postgresql-10.
Preparing to unpack .../postgresql-10_10.15-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-10 (10.15-1.pgdg100+1) ...
Setting up postgresql-client-10 (10.15-1.pgdg100+1) ...
update-alternatives: using /usr/share/postgresql/10/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-10 (10.15-1.pgdg100+1) ...
Creating new PostgreSQL cluster 10/main ...
/usr/lib/postgresql/10/bin/initdb -D /var/lib/postgresql/10/main --auth-local peer --auth-host md5
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/10/main ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Etc/UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctlcluster 10 main start

Ver Cluster Port Status Owner    Data directory              Log file
10  main    5432 down   postgres /var/lib/postgresql/10/main /var/log/postgresql/postgresql-10-main.log
update-alternatives: using /usr/share/postgresql/10/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Processing triggers for postgresql-common (223.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:

I, [2020-12-13T09:23:51.192964 #1]  INFO -- : > mkdir -p /shared/postgres_run
I, [2020-12-13T09:23:51.195917 #1]  INFO -- :
I, [2020-12-13T09:23:51.196235 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2020-12-13T09:23:51.198835 #1]  INFO -- :
I, [2020-12-13T09:23:51.199139 #1]  INFO -- : > chmod 775 /shared/postgres_run
I, [2020-12-13T09:23:51.201681 #1]  INFO -- :
I, [2020-12-13T09:23:51.202025 #1]  INFO -- : > rm -fr /var/run/postgresql
I, [2020-12-13T09:23:51.204199 #1]  INFO -- :
I, [2020-12-13T09:23:51.204549 #1]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2020-12-13T09:23:51.207718 #1]  INFO -- :
I, [2020-12-13T09:23:51.208017 #1]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2020/12/13 09:23:51 socat[1567] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): No such file or directory
I, [2020-12-13T09:23:51.217014 #1]  INFO -- :
I, [2020-12-13T09:23:51.217294 #1]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2020-12-13T09:23:51.220400 #1]  INFO -- :
I, [2020-12-13T09:23:51.220682 #1]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2020-12-13T09:23:51.223488 #1]  INFO -- :
I, [2020-12-13T09:23:51.223691 #1]  INFO -- : > mkdir -p /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-12-13T09:23:51.225967 #1]  INFO -- :
I, [2020-12-13T09:23:51.226198 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-12-13T09:23:51.228306 #1]  INFO -- :
I, [2020-12-13T09:23:51.233016 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2020-12-13T09:23:51.237345 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2020-12-13T09:23:51.237662 #1]  INFO -- : > chown -R root /var/lib/postgresql/10/main
I, [2020-12-13T09:23:51.244979 #1]  INFO -- :
I, [2020-12-13T09:23:51.245164 #1]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o postgres -g postgres /shared/postgres_data && sudo -E -u postgres /usr/lib/postgresql/10/bin/initdb -D /shared/postgres_data || exit 0
I, [2020-12-13T09:23:51.246982 #1]  INFO -- :
I, [2020-12-13T09:23:51.247152 #1]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2020-12-13T09:23:51.314470 #1]  INFO -- :
I, [2020-12-13T09:23:51.314888 #1]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2020-12-13T09:23:51.318075 #1]  INFO -- :
I, [2020-12-13T09:23:51.318499 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/10/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.319171 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.319652 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.320131 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.320672 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.321143 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.321608 #1]  INFO -- : > install -d -m 0755 -o postgres -g postgres /shared/postgres_backup
I, [2020-12-13T09:23:51.324709 #1]  INFO -- :
I, [2020-12-13T09:23:51.325108 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.325597 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.326097 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.326619 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres  peer in /etc/postgresql/10/main/pg_hba.conf
I, [2020-12-13T09:23:51.327039 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/10/main/pg_hba.conf
I, [2020-12-13T09:23:51.327456 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main
I, [2020-12-13T09:23:51.329156 #1]  INFO -- : > sleep 5
2020-12-13 09:23:51.347 UTC [1583] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-12-13 09:23:51.347 UTC [1583] LOG:  listening on IPv6 address "::", port 5432
2020-12-13 09:23:51.349 UTC [1583] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-12-13 09:23:51.363 UTC [1583] FATAL:  database files are incompatible with server
2020-12-13 09:23:51.363 UTC [1583] DETAIL:  The database cluster was initialized with PG_CONTROL_VERSION 1300, but the server was compiled with PG_CONTROL_VERSION 1002.
2020-12-13 09:23:51.363 UTC [1583] HINT:  It looks like you need to initdb.
2020-12-13 09:23:51.365 UTC [1583] LOG:  database system is shut down
I, [2020-12-13T09:23:56.331811 #1]  INFO -- :
I, [2020-12-13T09:23:56.332043 #1]  INFO -- : > su postgres -c 'createdb discourse' || true
createdb: could not connect to database template1: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.394383 #1]  INFO -- :
I, [2020-12-13T09:23:56.394680 #1]  INFO -- : > su postgres -c 'psql discourse -c "create user discourse;"' || true
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.454155 #1]  INFO -- :
I, [2020-12-13T09:23:56.454333 #1]  INFO -- : > su postgres -c 'psql discourse -c "grant all privileges on database discourse to discourse;"' || true
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.508933 #1]  INFO -- :
I, [2020-12-13T09:23:56.509118 #1]  INFO -- : > su postgres -c 'psql discourse -c "alter schema public owner to discourse;"'
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.560843 #1]  INFO -- :
I, [2020-12-13T09:23:56.561176 #1]  INFO -- : Terminating async processes


FAILED
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' failed with return #<Process::Status: pid 1609 exit 2>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "su postgres -c 'psql $db_name -c \"alter schema public owner to $db_user;\"'"
da620ae9048b2cda99c7a0d24e38c9dfafba5d61fac8c64c2da2362a19338a76
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

This bit seems a concern to me:

Using Discourse Doctor on both doesn’t seem to have any solutions.

It’s trying to upgrade postgres. So much of that is expected.

Maybe it’ll it over again and change to the pg 10 template as described in PostgreSQL 13 update

The line to prevent upgrade is still in the YML file.

I can enter the app and access the SQL on the old server, when I try on the new it returns:

psql: error: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

The line above seems to be the root problem @wincenworks

I cannot tell you what to do, but if I were you @wincenworks , I would install Discourse from scratch; and before building the container, I would set your template(s) to use PG10.

Then after you get that new instance up and running, then you can try to restore your Discourse instance from your current PG10 backup, from the command line (not the UI) inside your container.

HTH.

Trying to get a new instance with PG10 running right now, keepings timing out at the final stage of the registration process.

I would absolutely love to do that but:

  1. It won’t let me make a new backup
  2. Previous backups don’t restore

Hence the trying to scp them across.

Yes, it will not restore on your current configuration, as I understood you.

Have you tried using that backup to restore after a full fresh install, as I mentioned?

First, install Discourse from scratch, make sure it is 100% up and running with a PG10 install tempate.

Then, take your latest backup and restore that backup (from the command line, not the UI).