Crash complet du forum (Test pressé)

Mon forum a planté lors de la mise à jour du logiciel et a maintenant totalement disparu. Je reçois les messages d’erreur suivants…

« Oups
Le logiciel qui alimente ce forum de discussion a rencontré un problème inattendu. Nous nous excusons pour la gêne occasionnée.

Des

J’ai également mis à niveau et mon conteneur ne se compile pas et plante ensuite. C’est très malheureux.

Ne le redémarrez pas en reconstruisant lorsqu’il est bloqué, car cela a fait planter tout le système la dernière fois que je l’ai fait…

même comportement ici. Je voyais des erreurs avec les workers, je l’ai redémarré… planté… j’ai essayé de reconstruire. plantages perpétuels maintenant

/var/discourse# ./launcher rebuild app
x86_64 arch detected.
WARNING: containers/app.yml file is world-readable. You can secure this file by running: chmod o-rwx containers/app.yml
Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
2.0.20240825-0027: Pulling from discourse/base
Digest: sha256:6de68cb49198b5281f79ed9401b3fe818c854d220dcf0238549fe2f2adb19146
Status: Image is up to date for discourse/base:2.0.20240825-0027
/usr/local/lib/ruby/gems/3.3.0/gems/pups-1.2.1/lib/pups.rb
/usr/local/bin/pups --stdin
I, [2024-08-27T21:43:42.091270 #1]  INFO -- : Reading from stdin
I, [2024-08-27T21:43:42.110405 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2024-08-27T21:43:42.117678 #1]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x  chown:
I, [2024-08-27T21:43:42.125472 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2024-08-27T21:43:42.132700 #1]  INFO -- : File > /root/install_postgres  chmod: +x  chown:
I, [2024-08-27T21:43:42.139622 #1]  INFO -- : File > /root/upgrade_postgres  chmod: +x  chown:
I, [2024-08-27T21:43:42.140454 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/13/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.141762 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.142675 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.143534 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.144382 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.144912 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.145541 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.146355 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.146979 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.147851 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres  peer in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-27T21:43:42.148557 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-27T21:43:42.149423 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*::1\\/128.*$) with host all all ::/0 md5 in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-27T21:43:42.149931 #1]  INFO -- : > if [ -f /root/install_postgres ]; then
  /root/install_postgres && rm -f /root/install_postgres
elif [ -e /shared/postgres_run/.s.PGSQL.5432 ]; then
  socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
fi

2024/08/27 21:43:42 socat[28] E connect(, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): Connection refused
I, [2024-08-27T21:43:42.217004 #1]  INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2024-08-27T21:43:42.217327 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main
I, [2024-08-27T21:43:42.220344 #1]  INFO -- : Terminating async processes
2024-08-27 21:43:42.300 UTC [30] LOG:  starting PostgreSQL 13.16 (Debian 13.16-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-08-27 21:43:42.300 UTC [30] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-27 21:43:42.300 UTC [30] LOG:  listening on IPv6 address "::", port 5432
2024-08-27 21:43:42.303 UTC [30] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-27 21:43:42.310 UTC [31] LOG:  database system was interrupted; last known up at 2024-08-27 21:41:14 UTC
2024-08-27 21:43:42.503 UTC [31] LOG:  database system was not properly shut down; automatic recovery in progress
2024-08-27 21:43:42.507 UTC [31] LOG:  redo starts at 38C/55C02EA0
2024-08-27 21:43:42.507 UTC [31] LOG:  invalid record length at 38C/55C02ED8: wanted 24, got 0
2024-08-27 21:43:42.507 UTC [31] LOG:  redo done at 38C/55C02EA0
2024-08-27 21:43:42.540 UTC [30] LOG:  database system is ready to accept connections

Se bloque là indéfiniment… n’assigne jamais de ports au conteneur ou ne lance pas l’application rails ou quoi que ce soit, d’après ce que je peux dire

Si vous exécutez

./launcher start app

cela le relancera-t-il ?

Non… il y a un conteneur zombie que ./launcher rebuild app crée, ce qui donne le résultat ci-dessus. C’est à quoi ressemble le conteneur. Il commence à construire à partir de l’image de base de Discourse, mais ensuite il se bloque, comme mentionné ci-dessus. Il ne s’enregistre pas comme l’application Discourse.

/var/discourse# docker ps -a
CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS              PORTS               NAMES
02ae320b72a0        discourse/base:2.0.20240825-0027   "/bin/bash -c '/usr/..."   7 minutes ago       Up 7 minutes                            sleepy_driscoll

Lorsque j’exécute ./launcher start app, cela échoue car il essaie de démarrer une nouvelle application et PSQL s’exécute sur 5432 sur le conteneur zombie. Si je supprime le conteneur zombie (et/ou les images), il crée un nouveau conteneur et se bloque avec les journaux de la même manière que dans mon message précédent.

Très stressant et malheureux. Je ne sais pas comment nous en sommes arrivés là. J’ai désactivé tous les plugins dans mon app.yaml et j’ai essayé de reconstruire.

Je pense que ces journaux sont les plus pertinents pour la situation de mon forum

2024-08-27 21:43:42.300 UTC [30] LOG:  starting PostgreSQL 13.16 (Debian 13.16-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-08-27 21:43:42.300 UTC [30] LOG:  listening on IPv4 address \"0.0.0.0\", port 5432
2024-08-27 21:43:42.300 UTC [30] LOG:  listening on IPv6 address \"::\", port 5432
2024-08-27 21:43:42.303 UTC [30] LOG:  listening on Unix socket \"/var/run/postgresql/.s.PGSQL.5432\"
2024-08-27 21:43:42.310 UTC [31] LOG:  database system was interrupted; last known up at 2024-08-27 21:41:14 UTC
2024-08-27 21:43:42.503 UTC [31] LOG:  database system was not properly shut down; automatic recovery in progress
2024-08-27 21:43:42.507 UTC [31] LOG:  redo starts at 38C/55C02EA0
2024-08-27 21:43:42.507 UTC [31] LOG:  invalid record length at 38C/55C02ED8: wanted 24, got 0
2024-08-27 21:43:42.507 UTC [31] LOG:  redo done at 38C/55C02EA0
2024-08-27 21:43:42.540 UTC [30] LOG:  database system is ready to accept connections

ça reste bloqué ici pour toujours… ne compile jamais les assets, ne démarre jamais l’application rails, ne démarre jamais redis, etc.

1 « J'aime »

oh okay…so this is a thing for at least a handful of people :frowning:

More Info:

/var/discourse# ./launcher start app
x86_64 arch detected.
WARNING: containers/app.yml file is world-readable. You can secure this file by running: chmod o-rwx containers/app.yml

+ /usr/bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=8 -e UNICORN_SIDEKIQS=1 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e LETSENCRYPT_DIR=/shared/letsencrypt -e DISCOURSE_FORCE_HTTPS=true -e DISCOURSE_HOSTNAME=redacted.com -e DISCOURSE_DEVELOPER_EMAILS=redacted -e DISCOURSE_SMTP_ADDRESS=smtp.redacted.com -e DISCOURSE_SMTP_PORT=587 -e DISCOURSE_SMTP_USER_NAME=postmaster@redacted -e DISCOURSE_SMTP_PASSWORD=redacted -e DISCOURSE_SMTP_ENABLE_START_TLS=true -e LETSENCRYPT_ACCOUNT_EMAIL=redacted -h discourse-beta-ubuntu-app -e DOCKER_HOST_IP=172.17.0.1 --name app -t -p 80:80 -p 443:443 -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:52:ee:ee:62:b2 local_discourse/app /sbin/boot
Unable to find image 'local_discourse/app:latest' locally
/usr/bin/docker: Error response from daemon: pull access denied for local_discourse/app, repository does not exist or may require 'docker login'.
See '/usr/bin/docker run --help'.

When I try to start the app :point_up_2: …it complains the image local_discourse/app isnt there. Which is correct:

/var/discourse# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
discourse/base      2.0.20240825-0027   9dc96b6115cb        2 days ago          3.38GB

but trying to pull and build the image is not working, due to the db hanging

2 « J'aime »

Voir

pour la solution.