Falha no backup devido a erros PG/SQL

Continuando a discussão de Comando de backup para S3?:

Além da tentativa de fazer backup via ./launcher enter, descobri o que parece ser o motivo pelo qual os backups pararam de funcionar.

pg_dump: Falha ao despejar o conteúdo da tabela "topic_links": PQgetResult() falhou.
pg_dump: Mensagem de erro do servidor: ERROR: invalid memory alloc request size 18446744073709551613
pg_dump: O comando foi: COPY public.topic_links (id, topic_id, post_id, user_id, url, domain, internal, link_topic_id, created_at, updated_at, reflection, clicks, link_post_id, title, crawled_at, quote, extension) TO stdout;
EXCEPTION: pg_dump falhou
/var/www/discourse/lib/backup_restore/backuper.rb:152:in `dump_public_schema'
/var/www/discourse/lib/backup_restore/backuper.rb:36:in `run'
script/discourse:80:in `backup'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/command.rb:27:in `run'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/invocation.rb:127:in `invoke_command'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor.rb:392:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.0.1/lib/thor/base.rb:485:in `start'
script/discourse:284:in `<top (required)>'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `load'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `kernel_load'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:28:in `run'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:476:in `exec'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:30:in `dispatch'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/base.rb:476:in `start'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:24:in `start'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/exe/bundle:46:in `block in <top (required)>'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/friendly_errors.rb:123:in `with_friendly_errors'
/usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/exe/bundle:34:in `<top (required)>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'
Excluindo backups antigos...
Limpando arquivos...
Removendo sobras de '.tar'...
Marcando backup como concluído...
Atualizando estatísticas do disco...
Notificando 'system' sobre o fim do backup...
Concluído!
[FAILED]

Isso é particularmente frustrante, já que aparentemente não consigo usar os backups antigos também. Ao tentar restaurá-los, recebo este erro.

[2020-12-12 01:53:25] COPY 750 [2020-12-12 01:53:30] ERROR: null value in column "user_id" of relation "topic_users" violates not-null constraint [2020-12-12 01:53:30] DETAIL: Failing row contains (null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null). [2020-12-12 01:53:30] CONTEXT: COPY topic_users, line 623983: "\N \N \N \N \N \N \N \N \N \N \N \N \N \N \N \N" [2020-12-12 01:53:30] EXCEPTION: psql failed: CONTEXT: COPY topic_users, line 623983: "\N \N \N \N \N \N \N \N \N \N \N \N \N \N \N \N" [2020-12-12 01:53:30] /var/www/discourse/lib/backup_restore/database_restorer.rb:87:in `restore_dump' /var/www/discourse/lib/backup_restore/database_restorer.rb:26:in `restore' /var/www/discourse/lib/backup_restore/restorer.rb:51:in `run' /var/www/discourse/script/spawn_backup_restore.rb:23:in `restore' /var/www/discourse/script/spawn_backup_restore.rb:36:in `block in <main>' /var/www/discourse/script/spawn_backup_restore.rb:4:in `fork' /var/www/discourse/script/spawn_backup_restore.rb:4:in `<main>' [2020-12-12 01:53:30] Tentando rollback... [2020-12-12 01:53:30] Fazendo rollback... [2020-12-12 01:53:30] Limpando arquivos... [2020-12-12 01:53:30] Removendo funções do schema discourse_functions... [2020-12-12 01:53:30] Removendo diretório tmp '/var/www/discourse/tmp/restores/default/2020-12-12-014753'... [2020-12-12 01:53:30] Retomando sidekiq... [2020-12-12 01:53:30] Marcando restauração como concluída...
```\n
Será que há alguma manutenção que eu possa fazer para voltar ao ponto em que seja possível fazer backup/transfere?

Isso parece ser um problema.

Esta é uma instalação padrão? Qual versão do PostgreSQL é essa?

Você mencionou que havia algum problema com este servidor e que, por isso, está tentando migrar dele?

A instalação atual é: 2.7.0.beta1

O servidor tem uma história complicada (tem cerca de cinco anos) e foi uma instalação padrão auto-hospedada, depois transferida para hospedagem do Discourse, e depois transferida de volta — tentamos mantê-la o mais padrão possível.

Tenho quase certeza de que é a versão atual.

O servidor teve quedas ou travamentos intermitentes etc., que afetaram o desempenho e podem ter impactado o banco de dados. Na última vez que fiz uma limpeza do banco de dados, removi muitos dados do arquivo do PostgreSQL.

Atualmente, o arquivo de backup ao qual tenho acesso tem 1,5 GB, então tem sido grande demais para editar com qualquer software que tenho no meu PC atualmente.

Há outros problemas dos quais estou ciente, como falhas ao tentar migrar imagens para o S3, etc., mas anteriormente não havia problemas com backups etc.

Acho que eu copiaria todo o /var/discourse para um novo servidor para me afastar do problema que há naquele servidor e, em seguida, tentaria resolver as coisas.

Você pode ter um índice corrompido, mas estou no meu celular e não consigo entender bem os erros.

Just tried that - took some fussing about.

New system does not like it, pretty much just outright rejects the database.

Launcher is up-to-date
Stopping old container
+ /usr/bin/docker stop -t 60 app
app
cd /pups && git pull && /pups/bin/pups --stdin
Already up to date.
I, [2020-12-13T09:23:39.291334 #1]  INFO -- : Loading --stdin
I, [2020-12-13T09:23:39.296303 #1]  INFO -- : > DEBIAN_FRONTEND=noninteractive apt-get purge -y postgresql-13 postgresql-client-13 postgresql-contrib-13
I, [2020-12-13T09:23:41.511661 #1]  INFO -- : Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
  libllvm7 pgdg-keyring postgresql-client-common postgresql-common ssl-cert
Use 'apt autoremove' to remove them.
The following packages will be REMOVED:
  postgresql-13* postgresql-client-13*
0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.
After this operation, 54.3 MB disk space will be freed.
(Reading database ... 43863 files and directories currently installed.)
Removing postgresql-13 (13.1-1.pgdg100+1) ...
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of stop.
Removing postgresql-client-13 (13.1-1.pgdg100+1) ...
Processing triggers for postgresql-common (223.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
(Reading database ... 42050 files and directories currently installed.)
Purging configuration files for postgresql-13 (13.1-1.pgdg100+1) ...
Dropping cluster main...

I, [2020-12-13T09:23:41.511861 #1]  INFO -- : > apt-get update && apt-get install -y postgresql-10 postgresql-client-10 postgresql-contrib-10
debconf: delaying package configuration, since apt-utils is not installed
I, [2020-12-13T09:23:51.192217 #1]  INFO -- : Hit:1 http://deb.debian.org/debian buster InRelease
Get:2 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:3 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:4 http://apt.postgresql.org/pub/repos/apt buster-pgdg InRelease [104 kB]
Hit:5 https://deb.nodesource.com/node_10.x buster InRelease
Get:6 http://security.debian.org/debian-security buster/updates/main amd64 Packages [254 kB]
Get:7 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 Packages [216 kB]
Fetched 690 kB in 1s (525 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following package was automatically installed and is no longer required:
  libllvm7
Use 'apt autoremove' to remove it.
Suggested packages:
  postgresql-doc-10
The following NEW packages will be installed:
  postgresql-10 postgresql-client-10
0 upgraded, 2 newly installed, 0 to remove and 5 not upgraded.
Need to get 6,402 kB of archives.
After this operation, 30.6 MB of additional disk space will be used.
Get:1 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-client-10 amd64 10.15-1.pgdg100+1 [1,436 kB]
Get:2 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-10 amd64 10.15-1.pgdg100+1 [4,966 kB]
Fetched 6,402 kB in 2s (2,809 kB/s)
Selecting previously unselected package postgresql-client-10.
(Reading database ... 42050 files and directories currently installed.)
Preparing to unpack .../postgresql-client-10_10.15-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-client-10 (10.15-1.pgdg100+1) ...
Selecting previously unselected package postgresql-10.
Preparing to unpack .../postgresql-10_10.15-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-10 (10.15-1.pgdg100+1) ...
Setting up postgresql-client-10 (10.15-1.pgdg100+1) ...
update-alternatives: using /usr/share/postgresql/10/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-10 (10.15-1.pgdg100+1) ...
Creating new PostgreSQL cluster 10/main ...
/usr/lib/postgresql/10/bin/initdb -D /var/lib/postgresql/10/main --auth-local peer --auth-host md5
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/10/main ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Etc/UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctlcluster 10 main start

Ver Cluster Port Status Owner    Data directory              Log file
10  main    5432 down   postgres /var/lib/postgresql/10/main /var/log/postgresql/postgresql-10-main.log
update-alternatives: using /usr/share/postgresql/10/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Processing triggers for postgresql-common (223.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:

I, [2020-12-13T09:23:51.192964 #1]  INFO -- : > mkdir -p /shared/postgres_run
I, [2020-12-13T09:23:51.195917 #1]  INFO -- :
I, [2020-12-13T09:23:51.196235 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2020-12-13T09:23:51.198835 #1]  INFO -- :
I, [2020-12-13T09:23:51.199139 #1]  INFO -- : > chmod 775 /shared/postgres_run
I, [2020-12-13T09:23:51.201681 #1]  INFO -- :
I, [2020-12-13T09:23:51.202025 #1]  INFO -- : > rm -fr /var/run/postgresql
I, [2020-12-13T09:23:51.204199 #1]  INFO -- :
I, [2020-12-13T09:23:51.204549 #1]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2020-12-13T09:23:51.207718 #1]  INFO -- :
I, [2020-12-13T09:23:51.208017 #1]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2020/12/13 09:23:51 socat[1567] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): No such file or directory
I, [2020-12-13T09:23:51.217014 #1]  INFO -- :
I, [2020-12-13T09:23:51.217294 #1]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2020-12-13T09:23:51.220400 #1]  INFO -- :
I, [2020-12-13T09:23:51.220682 #1]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2020-12-13T09:23:51.223488 #1]  INFO -- :
I, [2020-12-13T09:23:51.223691 #1]  INFO -- : > mkdir -p /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-12-13T09:23:51.225967 #1]  INFO -- :
I, [2020-12-13T09:23:51.226198 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run/10-main.pg_stat_tmp
I, [2020-12-13T09:23:51.228306 #1]  INFO -- :
I, [2020-12-13T09:23:51.233016 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2020-12-13T09:23:51.237345 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2020-12-13T09:23:51.237662 #1]  INFO -- : > chown -R root /var/lib/postgresql/10/main
I, [2020-12-13T09:23:51.244979 #1]  INFO -- :
I, [2020-12-13T09:23:51.245164 #1]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o postgres -g postgres /shared/postgres_data && sudo -E -u postgres /usr/lib/postgresql/10/bin/initdb -D /shared/postgres_data || exit 0
I, [2020-12-13T09:23:51.246982 #1]  INFO -- :
I, [2020-12-13T09:23:51.247152 #1]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2020-12-13T09:23:51.314470 #1]  INFO -- :
I, [2020-12-13T09:23:51.314888 #1]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2020-12-13T09:23:51.318075 #1]  INFO -- :
I, [2020-12-13T09:23:51.318499 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/10/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.319171 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.319652 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.320131 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.320672 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.321143 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.321608 #1]  INFO -- : > install -d -m 0755 -o postgres -g postgres /shared/postgres_backup
I, [2020-12-13T09:23:51.324709 #1]  INFO -- :
I, [2020-12-13T09:23:51.325108 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.325597 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.326097 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/10/main/postgresql.conf
I, [2020-12-13T09:23:51.326619 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres  peer in /etc/postgresql/10/main/pg_hba.conf
I, [2020-12-13T09:23:51.327039 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/10/main/pg_hba.conf
I, [2020-12-13T09:23:51.327456 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main
I, [2020-12-13T09:23:51.329156 #1]  INFO -- : > sleep 5
2020-12-13 09:23:51.347 UTC [1583] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-12-13 09:23:51.347 UTC [1583] LOG:  listening on IPv6 address "::", port 5432
2020-12-13 09:23:51.349 UTC [1583] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-12-13 09:23:51.363 UTC [1583] FATAL:  database files are incompatible with server
2020-12-13 09:23:51.363 UTC [1583] DETAIL:  The database cluster was initialized with PG_CONTROL_VERSION 1300, but the server was compiled with PG_CONTROL_VERSION 1002.
2020-12-13 09:23:51.363 UTC [1583] HINT:  It looks like you need to initdb.
2020-12-13 09:23:51.365 UTC [1583] LOG:  database system is shut down
I, [2020-12-13T09:23:56.331811 #1]  INFO -- :
I, [2020-12-13T09:23:56.332043 #1]  INFO -- : > su postgres -c 'createdb discourse' || true
createdb: could not connect to database template1: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.394383 #1]  INFO -- :
I, [2020-12-13T09:23:56.394680 #1]  INFO -- : > su postgres -c 'psql discourse -c "create user discourse;"' || true
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.454155 #1]  INFO -- :
I, [2020-12-13T09:23:56.454333 #1]  INFO -- : > su postgres -c 'psql discourse -c "grant all privileges on database discourse to discourse;"' || true
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.508933 #1]  INFO -- :
I, [2020-12-13T09:23:56.509118 #1]  INFO -- : > su postgres -c 'psql discourse -c "alter schema public owner to discourse;"'
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2020-12-13T09:23:56.560843 #1]  INFO -- :
I, [2020-12-13T09:23:56.561176 #1]  INFO -- : Terminating async processes


FAILED
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' failed with return #<Process::Status: pid 1609 exit 2>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "su postgres -c 'psql $db_name -c \"alter schema public owner to $db_user;\"'"
da620ae9048b2cda99c7a0d24e38c9dfafba5d61fac8c64c2da2362a19338a76
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

This bit seems a concern to me:

Using Discourse Doctor on both doesn’t seem to have any solutions.

Está tentando atualizar o PostgreSQL. Muito disso é esperado.

Talvez ele tente novamente e mude para o modelo do pg 10, conforme descrito em Atualização do PostgreSQL 13

A linha que impede a atualização ainda está no arquivo YML.

Consegui entrar no aplicativo e acessar o SQL no servidor antigo, mas ao tentar no novo, retorna:

psql: erro: não foi possível conectar ao servidor: Arquivo ou diretório inexistente
        O servidor está rodando localmente e aceitando
        conexões no socket de domínio Unix "/var/run/postgresql/.s.PGSQL.5432"?

A linha acima parece ser o problema raiz, @wincenworks.

Não posso dizer o que você deve fazer, mas, se eu fosse você, @wincenworks, instalaria o Discourse do zero; e, antes de construir o contêiner, configuraria seu(s) modelo(s) para usar o PG10.

Depois que essa nova instância estiver funcionando, você poderá tentar restaurar sua instância do Discourse a partir do seu backup atual do PG10, pela linha de comando (não pela interface) dentro do seu contêiner.

Espero que ajude.

Estou tentando obter uma nova instância com o PG10 rodando agora, mas ela está dando timeout na etapa final do processo de registro.

Eu adoraria fazer isso, mas:

  1. Não me permite criar um novo backup
  2. Backups anteriores não restauram

Por isso estou tentando transferi-los via scp.

Sim, ele não será restaurado na sua configuração atual, conforme entendi.

Você já tentou usar esse backup para restaurar após uma instalação limpa completa, como mencionei?

Primeiro, instale o Discourse do zero, certificando-se de que ele esteja 100% operacional com um modelo de instalação do PG10.

Em seguida, pegue seu backup mais recente e restaure-o (pela linha de comando, não pela interface).