Forum Full Crash (Test Pressing)

My forum crashed when I was updating the software and has now totally disapeared. I get the following error messages…

" Oops

The software powering this discussion forum encountered an unexpected problem. We apologize for the inconvenience.

Detailed information about the error was logged, and an automatic notification generated. We’ll take a look at it.

No further action is necessary. However, if the error condition persists, you can provide additional detail, including steps to reproduce the error, by posting a discussion topic in the site’s feedback category."

The forum I run lives at > https://forum.testpressing.org/

Any help would be appreciated.

I also just upgraded and my container is not building and then crashing. It is very unfortunate.

Don’t restart it rebuilding when it gets stuck as it crashed the whole thing when I did it…

same behavior here. I was seeing errors with workers, restarted it…crashed…tried to rebuild. perpetual crashing now

/var/discourse# ./launcher rebuild app
x86_64 arch detected.
WARNING: containers/app.yml file is world-readable. You can secure this file by running: chmod o-rwx containers/app.yml
Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
2.0.20240825-0027: Pulling from discourse/base
Digest: sha256:6de68cb49198b5281f79ed9401b3fe818c854d220dcf0238549fe2f2adb19146
Status: Image is up to date for discourse/base:2.0.20240825-0027
/usr/local/lib/ruby/gems/3.3.0/gems/pups-1.2.1/lib/pups.rb
/usr/local/bin/pups --stdin
I, [2024-08-27T21:43:42.091270 #1]  INFO -- : Reading from stdin
I, [2024-08-27T21:43:42.110405 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2024-08-27T21:43:42.117678 #1]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x  chown:
I, [2024-08-27T21:43:42.125472 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2024-08-27T21:43:42.132700 #1]  INFO -- : File > /root/install_postgres  chmod: +x  chown:
I, [2024-08-27T21:43:42.139622 #1]  INFO -- : File > /root/upgrade_postgres  chmod: +x  chown:
I, [2024-08-27T21:43:42.140454 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/13/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.141762 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.142675 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.143534 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.144382 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.144912 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.145541 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.146355 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.146979 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-27T21:43:42.147851 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres  peer in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-27T21:43:42.148557 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-27T21:43:42.149423 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*::1\/128.*$) with host all all ::/0 md5 in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-27T21:43:42.149931 #1]  INFO -- : > if [ -f /root/install_postgres ]; then
  /root/install_postgres && rm -f /root/install_postgres
elif [ -e /shared/postgres_run/.s.PGSQL.5432 ]; then
  socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
fi

2024/08/27 21:43:42 socat[28] E connect(, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): Connection refused
I, [2024-08-27T21:43:42.217004 #1]  INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2024-08-27T21:43:42.217327 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main
I, [2024-08-27T21:43:42.220344 #1]  INFO -- : Terminating async processes
2024-08-27 21:43:42.300 UTC [30] LOG:  starting PostgreSQL 13.16 (Debian 13.16-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-08-27 21:43:42.300 UTC [30] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-27 21:43:42.300 UTC [30] LOG:  listening on IPv6 address "::", port 5432
2024-08-27 21:43:42.303 UTC [30] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-27 21:43:42.310 UTC [31] LOG:  database system was interrupted; last known up at 2024-08-27 21:41:14 UTC
2024-08-27 21:43:42.503 UTC [31] LOG:  database system was not properly shut down; automatic recovery in progress
2024-08-27 21:43:42.507 UTC [31] LOG:  redo starts at 38C/55C02EA0
2024-08-27 21:43:42.507 UTC [31] LOG:  invalid record length at 38C/55C02ED8: wanted 24, got 0
2024-08-27 21:43:42.507 UTC [31] LOG:  redo done at 38C/55C02EA0
2024-08-27 21:43:42.540 UTC [30] LOG:  database system is ready to accept connections

Just hangs there indefinitely…never assigns ports to the container or launches the rails app or anything as best as I can tell

If you do a

./launcher start app

will it come back up?

no…there is a zombie container that ./launcher rebuild app creates, which gives the output above. This is what the container looks like. It starts building from the discourse base image but then hangs, as mentioned above. It does not register as the discourse app.

/var/discourse# docker ps -a
CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS              PORTS               NAMES
02ae320b72a0        discourse/base:2.0.20240825-0027   "/bin/bash -c '/usr/…"   7 minutes ago       Up 7 minutes                            sleepy_driscoll

When I run ./launcher start app it errors because it tries to start a new app and PSQL is running on 5432 on the zombie container. If I delete the zombie container (and/or images), then it creates a new container and hangs with the logs the same way from my previous post

very stressful and unfortunate. I don’t know how we got to this point. I have disabled all plugins in my app.yaml and tried to rebuild

I think these logs are the most relevant for my forum’s situation

2024-08-27 21:43:42.300 UTC [30] LOG:  starting PostgreSQL 13.16 (Debian 13.16-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-08-27 21:43:42.300 UTC [30] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-27 21:43:42.300 UTC [30] LOG:  listening on IPv6 address "::", port 5432
2024-08-27 21:43:42.303 UTC [30] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-27 21:43:42.310 UTC [31] LOG:  database system was interrupted; last known up at 2024-08-27 21:41:14 UTC
2024-08-27 21:43:42.503 UTC [31] LOG:  database system was not properly shut down; automatic recovery in progress
2024-08-27 21:43:42.507 UTC [31] LOG:  redo starts at 38C/55C02EA0
2024-08-27 21:43:42.507 UTC [31] LOG:  invalid record length at 38C/55C02ED8: wanted 24, got 0
2024-08-27 21:43:42.507 UTC [31] LOG:  redo done at 38C/55C02EA0
2024-08-27 21:43:42.540 UTC [30] LOG:  database system is ready to accept connections

just hangs here forever…never compiles assets, never starts the rails app, never starts redis, etc.

1 Like

oh okay…so this is a thing for at least a handful of people :frowning:

More Info:

/var/discourse# ./launcher start app
x86_64 arch detected.
WARNING: containers/app.yml file is world-readable. You can secure this file by running: chmod o-rwx containers/app.yml

+ /usr/bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=8 -e UNICORN_SIDEKIQS=1 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e LETSENCRYPT_DIR=/shared/letsencrypt -e DISCOURSE_FORCE_HTTPS=true -e DISCOURSE_HOSTNAME=redacted.com -e DISCOURSE_DEVELOPER_EMAILS=redacted -e DISCOURSE_SMTP_ADDRESS=smtp.redacted.com -e DISCOURSE_SMTP_PORT=587 -e DISCOURSE_SMTP_USER_NAME=postmaster@redacted -e DISCOURSE_SMTP_PASSWORD=redacted -e DISCOURSE_SMTP_ENABLE_START_TLS=true -e LETSENCRYPT_ACCOUNT_EMAIL=redacted -h discourse-beta-ubuntu-app -e DOCKER_HOST_IP=172.17.0.1 --name app -t -p 80:80 -p 443:443 -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:52:ee:ee:62:b2 local_discourse/app /sbin/boot
Unable to find image 'local_discourse/app:latest' locally
/usr/bin/docker: Error response from daemon: pull access denied for local_discourse/app, repository does not exist or may require 'docker login'.
See '/usr/bin/docker run --help'.

When I try to start the app :point_up_2: …it complains the image local_discourse/app isnt there. Which is correct:

/var/discourse# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
discourse/base      2.0.20240825-0027   9dc96b6115cb        2 days ago          3.38GB

but trying to pull and build the image is not working, due to the db hanging

2 Likes

See

for the solution.