Tentando recuperar uma instalação

Instalei o Discourse em uma máquina CentOS seguindo o guia aqui.
Caso ajude, também estamos usando o NGINX Proxy Manager.

A instalação funcionou por algumas semanas, até que tivemos que reiniciar a máquina. Então, falhou ao iniciar. Aqui está a saída do rebuild:

Ensuring launcher is up to date
Launcher is up-to-date
2.0.20230409-0052: Pulling from discourse/base
Digest: sha256:dd75ceb9322f79629f8b0bf78cfb0f79ad6bb366b7ead3e1cd32dcb8712ec46f
Status: Image is up to date for discourse/base:2.0.20230409-0052
docker.io/discourse/base:2.0.20230409-0052
/usr/local/lib/ruby/gems/3.2.0/gems/pups-1.1.1/lib/pups.rb
/usr/local/bin/pups --stdin
I, [2023-04-13T09:55:28.002076 #1]  INFO -- : Reading from stdin
I, [2023-04-13T09:55:28.005509 #1]  INFO -- : > locale-gen $LANG && update-locale
I, [2023-04-13T09:55:28.031953 #1]  INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2023-04-13T09:55:28.032080 #1]  INFO -- : > mkdir -p /shared/postgres_run
I, [2023-04-13T09:55:28.034322 #1]  INFO -- :
I, [2023-04-13T09:55:28.034470 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2023-04-13T09:55:28.036190 #1]  INFO -- :
I, [2023-04-13T09:55:28.036297 #1]  INFO -- : > chmod 775 /shared/postgres_run
I, [2023-04-13T09:55:28.037819 #1]  INFO -- :
I, [2023-04-13T09:55:28.037918 #1]  INFO -- : > rm -fr /var/run/postgresql
I, [2023-04-13T09:55:28.039663 #1]  INFO -- :
I, [2023-04-13T09:55:28.039770 #1]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2023-04-13T09:55:28.041327 #1]  INFO -- :
I, [2023-04-13T09:55:28.041457 #1]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGS
QL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2023/04/13 09:55:28 socat[19] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): No such file
or directory
I, [2023-04-13T09:55:28.045222 #1]  INFO -- :
I, [2023-04-13T09:55:28.045329 #1]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2023-04-13T09:55:28.047498 #1]  INFO -- :
I, [2023-04-13T09:55:28.047590 #1]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2023-04-13T09:55:28.049691 #1]  INFO -- :
I, [2023-04-13T09:55:28.049815 #1]  INFO -- : > mkdir -p /shared/postgres_run/13-main.pg_stat_tmp
I, [2023-04-13T09:55:28.051658 #1]  INFO -- :
I, [2023-04-13T09:55:28.051800 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run/13-main.pg_
stat_tmp
I, [2023-04-13T09:55:28.053477 #1]  INFO -- :
I, [2023-04-13T09:55:28.057301 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2023-04-13T09:55:28.060969 #1]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x  chown:
I, [2023-04-13T09:55:28.064663 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2023-04-13T09:55:28.068319 #1]  INFO -- : File > /root/upgrade_postgres  chmod: +x  chown:
I, [2023-04-13T09:55:28.068510 #1]  INFO -- : > chown -R root /var/lib/postgresql/13/main
I, [2023-04-13T09:55:37.064277 #1]  INFO -- :
I, [2023-04-13T09:55:37.064575 #1]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o
I, [2023-04-13T09:55:37.066210 #1]  INFO -- :
I, [2023-04-13T09:55:37.066255 #1]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2023-04-13T09:55:37.076706 #1]  INFO -- :
I, [2023-04-13T09:55:37.076786 #1]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2023-04-13T09:55:37.078711 #1]  INFO -- :
I, [2023-04-13T09:55:37.078822 #1]  INFO -- : > /root/upgrade_postgres
I, [2023-04-13T09:55:37.082200 #1]  INFO -- :
I, [2023-04-13T09:55:37.082297 #1]  INFO -- : > rm /root/upgrade_postgres
I, [2023-04-13T09:55:37.083919 #1]  INFO -- :
I, [2023-04-13T09:55:37.084109 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/13/main' w
I, [2023-04-13T09:55:37.084484 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addr
I, [2023-04-13T09:55:37.085049 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchrono
I, [2023-04-13T09:55:37.085423 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffer
I, [2023-04-13T09:55:37.085857 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work
I, [2023-04-13T09:55:37.086303 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with d
I, [2023-04-13T09:55:37.086614 #1]  INFO -- : > install -d -m 0755 -o postgres -g postgres /shared/postg
I, [2023-04-13T09:55:37.089011 #1]  INFO -- :
I, [2023-04-13T09:55:37.089245 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoi
I, [2023-04-13T09:55:37.089516 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_co
I, [2023-04-13T09:55:37.089887 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with l
I, [2023-04-13T09:55:37.090322 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) wi
I, [2023-04-13T09:55:37.090509 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all al
I, [2023-04-13T09:55:37.090946 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*::1\/128.*$) with host a
I, [2023-04-13T09:55:37.091279 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u pos
I, [2023-04-13T09:55:37.092562 #1]  INFO -- : > sleep 5
2023-04-13 09:55:37.239 UTC [42] LOG:  starting PostgreSQL 13.10 (Debian 13.10-1.pgdg110+1) on x86_64-pc
2023-04-13 09:55:37.240 UTC [42] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-04-13 09:55:37.240 UTC [42] LOG:  listening on IPv6 address "::", port 5432
2023-04-13 09:55:37.287 UTC [42] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-13 09:55:37.338 UTC [45] LOG:  database system was interrupted; last known up at 2023-04-13 08:3
2023-04-13 09:55:37.795 UTC [45] LOG:  invalid resource manager ID in primary checkpoint record
2023-04-13 09:55:37.795 UTC [45] PANIC:  could not locate a valid checkpoint record
2023-04-13 09:55:38.256 UTC [42] LOG:  startup process (PID 45) was terminated by signal 6: Aborted
2023-04-13 09:55:38.256 UTC [42] LOG:  aborting startup due to startup process failure
2023-04-13 09:55:38.283 UTC [42] LOG:  database system is shut down
I, [2023-04-13T09:55:42.094474 #1]  INFO -- :
I, [2023-04-13T09:55:42.094602 #1]  INFO -- : > su postgres -c 'createdb discourse' || true
createdb: error: could not connect to database template1: connection to server on socket "/var/run/postg
        Is the server running locally and accepting connections on that socket?
I, [2023-04-13T09:55:42.130700 #1]  INFO -- :
I, [2023-04-13T09:55:42.130828 #1]  INFO -- : > su postgres -c 'psql discourse -c "create user discourse
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?
I, [2023-04-13T09:55:42.166116 #1]  INFO -- :
I, [2023-04-13T09:55:42.166242 #1]  INFO -- : > su postgres -c 'psql discourse -c "grant all privileges on database discourse to discourse;"' || true
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?
I, [2023-04-13T09:55:42.201461 #1]  INFO -- :
I, [2023-04-13T09:55:42.201617 #1]  INFO -- : > su postgres -c 'psql discourse -c "alter schema public owner to discourse;"'
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?
I, [2023-04-13T09:55:42.236768 #1]  INFO -- :
I, [2023-04-13T09:55:42.236996 #1]  INFO -- : > Terminating async processes

FAILED
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' failed with return #<Process::Status: pid 55 exit 2>
Location of failure: /usr/local/lib/ruby/gems/3.2.0/gems/pups-1.1.1/lib/pups/exec_command.rb:117:in `spawn'
exec failed with the params "su postgres -c 'psql $db_name -c \\\"alter schema public owner to $db_user;\\\"'\"
bootstrap failed with exit code 2
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
cd04fbb38f1ef61e418d680b969c2c056439e3b1dbfe64608524a8d3361cd91c

O discourse-doctor falha em encontrar o aplicativo docker em execução:

==================== SERIOUS PROBLEM!!!! ====================
app not running!
Attempting to rebuild

E tenta reconstruir, então basicamente termina com o mesmo log acima.

Obrigado desde já!

1 curtida

PANIC: could not locate a valid checkpoint record

Acho que seu banco de dados está corrompido. Na minha opinião, o melhor seria restaurar a partir de um backup. Acho que o Discourse pode fazer backups por padrão? Procure em discourse/shared/app/backups ou algo assim.


Se você não tiver um backup do Discourse, isso se torna mais um tópico de suporte do PostgreSQL do que do Discourse. Posso tentar ajudar, mas não sou um especialista.

Talvez você tenha que usar isto:

1 curtida

Parece que sim. Espero que você consiga encontrar uma solução, @Norike … enquanto isso, acho isso bastante preocupante - acredito que um reinício iniciado por software deveria deixar o banco de dados em um estado limpo. Foi um reinício forçado, ou seja, um ciclo de energia?

Gostaria de saber se faz diferença em qual sistema de arquivos seu banco de dados está armazenado. O que df -T informa?

Obrigado!

De fato, encontrei alguns backups na pasta /var/discourse/shared/standalone/backups/default, o que parece promissor.
No entanto, de acordo com o guia Restaurar um Backup da Linha de Comando, o contêiner precisa estar em execução para importar o banco de dados, mas o comando ./launcher enter app falha:

x86_64 arch detected.
WARNING: containers/app.yml file is world-readable. You can secure this file by running: chmod o-rwx containers/app.yml
Error response from daemon: No such container: app

Acho que deveria começar com uma instalação limpa. Existe um guia para limpar e começar de novo? O seguinte estaria correto?

  • Copiar os backups da pasta /var/discourse/shared/standalone/backups/default para um local diferente fora da pasta /var/discourse
  • Remover a pasta /var/discourse
  • Instalar o Discourse do zero seguindo o guia Instalar na Nuvem para ter uma nova instância em execução.
  • Restaurar o backup com a interface gráfica ou com o guia da linha de comando acima

Foi uma reinicialização de software após a aplicação de algumas atualizações.

Isto:

Filesystem          Type     1K-blocks     Used Available Use% Mounted on
devtmpfs            devtmpfs      4096        0      4096   0% /dev
tmpfs               tmpfs      3869940        0   3869940   0% /dev/shm
tmpfs               tmpfs      1547980    17600   1530380   2% /run
/dev/mapper/cs-root xfs       73364480 16750976  56613504  23% /
/dev/sda2           xfs        1038336   395812    642524  39% /boot
/dev/mapper/cs-home xfs      893122476 10838052 882284424   2% /home
/dev/sda1           vfat        613160     7644    605516   2% /boot/efi
overlay             overlay   73364480 16750976  56613504  23% /var/lib/docker/overlay2/a9c2622d4167f08ff4697a4c49febf55a7e460e087c41c05e6c1bbd321b13f62/merged
overlay             overlay   73364480 16750976  56613504  23% /var/lib/docker/overlay2/765273939fa75096c994588ba9c9bac6b5a9b909a60a962169b83f7c8a213b7f/merged
tmpfs               tmpfs       773988        4    773984   1% /run/user/1000

Obrigado - não estou familiarizado com XFS, mas uma pesquisa rápida sugere que ele deve ser seguro de usar. Apenas possivelmente xfs_info / será útil e informativo.

ok, então copie todo este diretório para algum lugar para guardá-lo:

/var/discourse/shared

dessa forma, você pode tentar restaurar a partir de backups ou até mesmo tentar reparar o banco de dados corrompido, se precisar.

então exclua seus contêineres docker, faça um docker image prune -a, exclua /var/discourse e instale o discourse novamente. em seguida, copie o arquivo de backup mais recente para o local e tente restaurar.

1 curtida

Isso restaurou. Obrigado!

2 curtidas

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.