Copia de seguridad fallida debido a errores de PG/SQL

Continuando la discusión de ¿Comando para copias de seguridad en S3?:

Además de intentar realizar una copia de seguridad mediante ./launcher enter, he descubierto lo que parece ser la razón por la que las copias de seguridad han dejado de funcionar.

pg_dump: Error al volcar el contenido de la tabla "topic_links": PQgetResult() falló.
pg_dump: Mensaje de error del servidor: ERROR: tamaño de solicitud de asignación de memoria inválido 18446744073709551613
pg_dump: El comando fue: COPY public.topic_links (id, topic_id, post_id, user_id, url, domain, internal, link_topic_id, created_at, updated_at, reflection, clicks, link_post_id, title, crawled_at, quote, extension) TO stdout;
EXCEPTION: pg_dump falló
/var/www/discourse/lib/backup_restore/backuper.rb:152:in `dump_public_schema'
/var/www/discourse/lib/backup_restore/backuper.rb:36:in `run'
script/discourse:80:in `backup'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/command.rb:27:in `run'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/invocation.rb:127:in `invoke_command'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor.rb:392:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/base.rb:485:in `start'
script/discourse:284:in `<top (required)>'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `load'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `kernel_load'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:28:in `run'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:476:in `exec'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:30:in `dispatch'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/base.rb:476:in `start'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:24:in `start'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/exe/bundle:46:in `block in <top (required)>'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/friendly_errors.rb:123:in `with_friendly_errors'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/exe/bundle:34:in `<top (required)>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'
Eliminando copias de seguridad antiguas...
Limpiando...
Eliminando residuos '.tar'...
Marcando la copia de seguridad como finalizada...
Actualizando estadísticas del disco...
Notificando al 'sistema' del fin de la copia de seguridad...
¡Finalizado!
[FAILED]

Esto es particularmente frustrante, ya que aparentemente tampoco puedo usar las copias de seguridad antiguas; al intentar restaurarlas, obtengo este error.

[2020-12-12 01:53:25] COPY 750 [2020-12-12 01:53:30] ERROR: el valor nulo en la columna "user_id" de la relación "topic_users" viola la restricción not-null [2020-12-12 01:53:30] DETAIL: Fila fallida contiene (null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null). [2020-12-12 01:53:30] CONTEXT: COPY topic_users, línea 623983: "\N \N \N \N \N \N \N \N \N \N \N \N \N \N \N \N" [2020-12-12 01:53:30] EXCEPTION: psql falló: CONTEXT: COPY topic_users, línea 623983: "\N \N \N \N \N \N \N \N \N \N \N \N \N \N \N \N" [2020-12-12 01:53:30] /var/www/discourse/lib/backup_restore/database_restorer.rb:87:in `restore_dump' /var/www/discourse/lib/backup_restore/database_restorer.rb:26:in `restore' /var/www/discourse/lib/backup_restore/restorer.rb:51:in `run' /var/www/discourse/script/spawn_backup_restore.rb:23:in `restore' /var/www/discourse/script/spawn_backup_restore.rb:36:in `block in <main>' /var/www/discourse/script/spawn_backup_restore.rb:4:in `fork' /var/www/discourse/script/spawn_backup_restore.rb:4:in `<main>' [2020-12-12 01:53:30] Intentando revertir... [2020-12-12 01:53:30] Revertiendo... [2020-12-12 01:53:30] Limpiando... [2020-12-12 01:53:30] Eliminando funciones del esquema discourse_functions... [2020-12-12 01:53:30] Eliminando directorio tmp '/var/www/discourse/tmp/restores/default/2020-12-12-014753'... [2020-12-12 01:53:30] Reanudando sidekiq... [2020-12-12 01:53:30] Marcando la restauración como finalizada...

¿Acaso hay algún mantenimiento que pueda realizar para devolverlo al estado en el que pueda ser respaldado o transferido?

Eso parece ser un problema.

¿Es una instalación estándar? ¿Qué versión de PostgreSQL es?

Dijiste que había algún problema con este servidor y que por eso estás intentando migrar desde él.

La instalación actual es: 2.7.0.beta1

El servidor tiene una historia complicada (tiene unos cinco años). Fue una instalación estándar autoalojada, luego se transfirió a Discourse Hosting y después se devolvió. Hemos intentado mantenerla lo más estándar posible.

Estoy bastante seguro de que es la versión actual.

El servidor ha tenido interrupciones intermitentes y bloqueos, etc., que han afectado el rendimiento y podrían haber impactado la base de datos. La última vez que realicé una limpieza de la base de datos, eliminé una gran cantidad de datos del archivo de PostgreSQL.

Actualmente, el archivo de respaldo al que tengo acceso es de 1,5 GB, por lo que ha sido demasiado grande para editarlo con cualquier software que tengo actualmente en mi PC.

Hay otros problemas de los que soy consciente, como el fallo al intentar migrar imágenes a S3, etc., pero anteriormente no había problemas con las copias de seguridad, etc.

Creo que copiaría todo /var/discourse a un nuevo servidor para alejarme de lo que esté mal con ese servidor y luego trataría de poner las cosas en orden.

Puede que tengas un índice corrupto, pero estoy en mi teléfono y no puedo entender del todo los errores.

Just tried that - took some fussing about.

New system does not like it, pretty much just outright rejects the database.

Launcher is up-to-date
Stopping old container
+ /usr/bin/docker stop -t 60 app
app
cd /pups && git pull && /pups/bin/pups --stdin
Already up to date.
I, [2020-12-13T09:23:39.291334 #1]  INFO -- : Loading --stdin
I, [2020-12-13T09:23:39.296303 #1]  INFO -- : > DEBIAN_FRONTEND=noninteractive apt-get purge -y postgresql-13 postgresql-client-13 postgresql-contrib-13
I, [2020-12-13T09:23:41.511661 #1]  INFO -- : Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
  libllvm7 pgdg-keyring postgresql-client-common postgresql-common ssl-cert
Use 'apt autoremove' to remove them.
The following packages will be REMOVED:
  postgresql-13* postgresql-client-13*
0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.
After this operation, 54.3 MB disk space will be freed.
(Reading database ... 43863 files and directories currently installed.)
Removing postgresql-13 (13.1-1.pgdg100+1) ...
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of stop.
Removing postgresql-client-13 (13.1-1.pgdg100+1) ...
Processing triggers for postgresql-common (223.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
(Reading database ... 42050 files and directories currently installed.)
Purging configuration files for postgresql-13 (13.1-1.pgdg100+1) ...
Dropping cluster main...

I, [2020-12-13T09:23:41.511861 #1]  INFO -- : > apt-get update && apt-get install -y postgresql-10 postgresql-client-10 postgresql-contrib-10
debconf: delaying package configuration, since apt-utils is not installed
I, [2020-12-13T09:23:51.192217 #1]  INFO -- : Hit:1 http://deb.debian.org/debian buster InRelease
Get:2 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:3 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:4 http://apt.postgresql.org/pub/repos/apt buster-pgdg InRelease [104 kB]
Hit:5 https://deb.nodesource.com/node_10.x buster InRelease
Get:6 http://security.debian.org/debian-security buster/updates/main amd64 Packages [254 kB]
Get:7 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 Packages [216 kB]
Fetched 690 kB in 1s (525 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following package was automatically installed and is no longer required:
  libllvm7
Use 'apt autoremove' to remove it.
Suggested packages:
  postgresql-doc-10
The following NEW packages will be installed:
  postgresql-10 postgresql-client-10
0 upgraded, 2 newly installed, 0 to remove and 5 not upgraded.
Need to get 6,402 kB of archives.
After this operation, 30.6 MB of additional disk space will be used.
Get:1 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-client-10 amd64 10.15-1.pgdg100+1 [1,436 kB]
Get:2 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-10 amd64 10.15-1.pgdg100+1 [4,966 kB]
Fetched 6,402 kB in 2s (2,809 kB/s)
Selecting previously unselected package postgresql-client-10.
(Reading database ... 42050 files and directories currently installed.)
Preparing to unpack .../postgresql-client-10_10.15-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-client-10 (10.15-1.pgdg100+1) ...
Selecting previously unselected package postgresql-10.
Preparing to unpack .../postgresql-10_10.15-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-10 (10.15-1.pgdg100+1) ...
Setting up postgresql-client-10 (10.15-1.pgdg100+1) ...
update-alternatives: using /usr/share/postgresql/10/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-10 (10.15-1.pgdg100+1) ...
Creating new PostgreSQL cluster 10/main ...
/usr/lib/postgresql/10/bin/initdb -D /var/lib/postgresql/10/main --auth-local peer --auth-host md5
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/10/main ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Etc/UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctlcluster 10 main start

Ver Cluster Port Status Owner    Data directory              Log file
10  main    5432 down   postgres /var/lib/postgresql/10/main /var/log/postgresql/postgresql-10-main.log
update-alternatives: using /usr/share/postgresql/10/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Processing triggers for postgresql-common (223.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:

I, [2020-12-13T09:23:51.192964 #1]  INFO -- : > mkdir -p /shared/postgres_run
I, [2020-12-13T09:23:51.195917 #1]  INFO -- :
I, [2020-12-13T09:23:51.196235 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2020-12-13T09:23:51.198835 #1]  INFO -- :
I, [2020-12-13T09:23:51.199139 #1]  INFO -- : > chmod 775 /shared/postgres_run
I, [2020-12-13T09:23:51.201681 #1]  INFO -- :
I, [2020-12-13T09:23:51.202025 #1]  INFO -- : > rm -fr /var/run/postgresql
I, [2020-12-13T09:23:51.204199 #1]  INFO -- :
I, [2020-12-13T09:23:51.204549 #1]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2020-12-13T09:23:51.207718 #1]  INFO -- :
I, [2020-12-13T09:23:51.208017 #1]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2020/12/13 09:23:51 socat[1567] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): No such file or directory
I, [2020-12-13T09:23:51.217014 #1]  INFO -- :
I, [2020-12-13T09:23:51.217294 #1]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2020-12-13T09:23:51.220400 #1]  INFO -- :
I, [2020-12-13T09:23:51.220682 #1]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2020-12-13T09:23:51.223488 #1]  INFO -- :
I, [2020-12-13T09:23:51.223691 #1]  INFO -- : > mkdir -p /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-12-13T09:23:51.225967 #1]  INFO -- :
I, [2020-12-13T09:23:51.226198 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-12-13T09:23:51.228306 #1]  INFO -- :
I, [2020-12-13T09:23:51.233016 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2020-12-13T09:23:51.237345 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2020-12-13T09:23:51.237662 #1]  INFO -- : > chown -R root /var/lib/postgresql/10/main
I, [2020-12-13T09:23:51.244979 #1]  INFO -- :
I, [2020-12-13T09:23:51.245164 #1]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o postgres -g postgres /shared/postgres_data && sudo -E -u postgres /usr/lib/postgresql/10/bin/initdb -D /shared/postgres_data || exit 0
I, [2020-12-13T09:23:51.246982 #1]  INFO -- :
I, [2020-12-13T09:23:51.247152 #1]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2020-12-13T09:23:51.314470 #1]  INFO -- :
I, [2020-12-13T09:23:51.314888 #1]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2020-12-13T09:23:51.318075 #1]  INFO -- :
I, [2020-12-13T09:23:51.318499 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/10/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.319171 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.319652 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.320131 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.320672 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.321143 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.321608 #1]  INFO -- : > install -d -m 0755 -o postgres -g postgres /shared/postgres_backup
I, [2020-12-13T09:23:51.324709 #1]  INFO -- :
I, [2020-12-13T09:23:51.325108 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.325597 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.326097 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.326619 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres  peer in /etc/postgresql/10/main/pg_hba.conf
I, [2020-12-13T09:23:51.327039 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/10/main/pg_hba.conf
I, [2020-12-13T09:23:51.327456 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main
I, [2020-12-13T09:23:51.329156 #1]  INFO -- : > sleep 5
2020-12-13 09:23:51.347 UTC [1583] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-12-13 09:23:51.347 UTC [1583] LOG:  listening on IPv6 address "::", port 5432
2020-12-13 09:23:51.349 UTC [1583] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-12-13 09:23:51.363 UTC [1583] FATAL:  database files are incompatible with server
2020-12-13 09:23:51.363 UTC [1583] DETAIL:  The database cluster was initialized with PG_CONTROL_VERSION 1300, but the server was compiled with PG_CONTROL_VERSION 1002.
2020-12-13 09:23:51.363 UTC [1583] HINT:  It looks like you need to initdb.
2020-12-13 09:23:51.365 UTC [1583] LOG:  database system is shut down
I, [2020-12-13T09:23:56.331811 #1]  INFO -- :
I, [2020-12-13T09:23:56.332043 #1]  INFO -- : > su postgres -c 'createdb discourse' || true
createdb: could not connect to database template1: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.394383 #1]  INFO -- :
I, [2020-12-13T09:23:56.394680 #1]  INFO -- : > su postgres -c 'psql discourse -c "create user discourse;"' || true
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.454155 #1]  INFO -- :
I, [2020-12-13T09:23:56.454333 #1]  INFO -- : > su postgres -c 'psql discourse -c "grant all privileges on database discourse to discourse;"' || true
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.508933 #1]  INFO -- :
I, [2020-12-13T09:23:56.509118 #1]  INFO -- : > su postgres -c 'psql discourse -c "alter schema public owner to discourse;"'
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.560843 #1]  INFO -- :
I, [2020-12-13T09:23:56.561176 #1]  INFO -- : Terminating async processes


FAILED
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' failed with return #<Process::Status: pid 1609 exit 2>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "su postgres -c 'psql $db_name -c \"alter schema public owner to $db_user;\"'"
da620ae9048b2cda99c7a0d24e38c9dfafba5d61fac8c64c2da2362a19338a76
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

This bit seems a concern to me:

Using Discourse Doctor on both doesn’t seem to have any solutions.

Está intentando actualizar PostgreSQL. Gran parte de eso es esperado.

Quizás lo intente de nuevo y cambie a la plantilla de pg 10 como se describe en Actualización de PostgreSQL 13

La línea para evitar la actualización sigue en el archivo YML.

Puedo entrar a la aplicación y acceder a SQL en el servidor antiguo, pero cuando lo intento en el nuevo, devuelve:

psql: error: no se pudo conectar al servidor: No existe tal archivo o directorio
        ¿Está el servidor ejecutándose localmente y aceptando
        conexiones en el socket de dominio Unix "/var/run/postgresql/.s.PGSQL.5432"?

La línea de arriba parece ser el problema raíz, @wincenworks.

No puedo decirte qué hacer, pero si yo fuera tú, @wincenworks, instalaría Discourse desde cero; y antes de construir el contenedor, configuraría tu(s) plantilla(s) para que usen PG10.

Una vez que tengas esa nueva instancia funcionando, podrás intentar restaurar tu instancia de Discourse desde tu copia de seguridad actual de PG10, desde la línea de comandos (no desde la interfaz de usuario) dentro de tu contenedor.

Espero que esto te ayude.

Estoy intentando obtener una nueva instancia con PG10 ejecutándose ahora mismo, pero sigue dando timeout en la etapa final del proceso de registro.

Me encantaría hacerlo, pero:

  1. No me permite crear una nueva copia de seguridad
  2. Las copias de seguridad anteriores no se restauran

Por eso estoy intentando transferirlas mediante scp.

Sí, no se restaurará en tu configuración actual, tal como te entendí.

¿Has intentado usar esa copia de seguridad para restaurar después de una instalación completa y nueva, como mencioné?

Primero, instala Discourse desde cero y asegúrate de que esté 100% operativo con una plantilla de instalación de PostgreSQL 10.

Luego, toma tu última copia de seguridad y restaura esa copia de seguridad (desde la línea de comandos, no desde la interfaz de usuario).