I’m running into an issue with PostgreSQL startup issue when I tried to rebuild my app, and I’m hoping to get some help.
Here’s the log, it has been stuck on this status for more than 30 mins.
Status: Image is up to date for discourse/base:2.0.20240825-0027
docker.io/discourse/base:2.0.20240825-0027
/usr/local/lib/ruby/gems/3.3.0/gems/pups-1.2.1/lib/pups.rb
/usr/local/bin/pups --stdin
I, [2024-08-26T17:16:15.344712 #1] INFO -- : Reading from stdin
I, [2024-08-26T17:16:15.357924 #1] INFO -- : File > /etc/service/postgres/run chmod: +x chown:
I, [2024-08-26T17:16:15.362740 #1] INFO -- : File > /etc/service/postgres/log/run chmod: +x chown:
I, [2024-08-26T17:16:15.367767 #1] INFO -- : File > /etc/runit/3.d/99-postgres chmod: +x chown:
I, [2024-08-26T17:16:15.372845 #1] INFO -- : File > /root/install_postgres chmod: +x chown:
I, [2024-08-26T17:16:15.377501 #1] INFO -- : File > /root/upgrade_postgres chmod: +x chown:
I, [2024-08-26T17:16:15.377876 #1] INFO -- : Replacing data_directory = '/var/lib/postgresql/13/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.378854 #1] INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.379386 #1] INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.379835 #1] INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.380263 #1] INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.380761 #1] INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.381203 #1] INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.381901 #1] INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.382352 #1] INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.382802 #1] INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres peer in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-26T17:16:15.383231 #1] INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-26T17:16:15.383604 #1] INFO -- : Replacing (?-mix:^host.*all.*all.*::1\/128.*$) with host all all ::/0 md5 in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-26T17:16:15.384079 #1] INFO -- : > if [ -f /root/install_postgres ]; then
/root/install_postgres && rm -f /root/install_postgres
elif [ -e /shared/postgres_run/.s.PGSQL.5432 ]; then
socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
fi
2024/08/26 17:16:15 socat[28] E connect(, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): Connection refused
I, [2024-08-26T17:16:15.452500 #1] INFO -- : Generating locales (this might take a while)...
Generation complete.
I, [2024-08-26T17:16:15.453058 #1] INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main
I, [2024-08-26T17:16:15.455944 #1] INFO -- : Terminating async processes
2024-08-26 17:16:15.500 UTC [30] LOG: starting PostgreSQL 13.16 (Debian 13.16-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-08-26 17:16:15.501 UTC [30] LOG: listening on IPv4 address "0.0.0.0", port 5432
2024-08-26 17:16:15.501 UTC [30] LOG: listening on IPv6 address "::", port 5432
2024-08-26 17:16:15.507 UTC [30] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-26 17:16:15.516 UTC [31] LOG: database system was interrupted; last known up at 2024-08-26 17:10:28 UTC
2024-08-26 17:16:15.769 UTC [31] LOG: database system was not properly shut down; automatic recovery in progress
2024-08-26 17:16:15.774 UTC [31] LOG: redo starts at 18F/E62D1458
2024-08-26 17:16:15.774 UTC [31] LOG: invalid record length at 18F/E62D1490: wanted 24, got 0
2024-08-26 17:16:15.774 UTC [31] LOG: redo done at 18F/E62D1458
2024-08-26 17:16:15.809 UTC [30] LOG: database system is ready to accept connections```
I have been running into the same thing trying to rebuild the last few days. I cannot get Discourse to run or rebuild without issue.
I tried using the start/stop capability and it didn’t seem to work. The VM itself has been rebooted a couple of times, as well. It just keeps hanging at that line about the database ready to accept connections.
Control-C didn’t work and I tried many different things, including revert back to old versions, but as soon as I tried the rebuild, it got stuck in the exact same place.
I managed to make more progress. In /var/discourse/ on my server, I checked out commit b1108913820edd27f869634d0fc654639758889a. This commit is from a few days ago, and does not have these three commits (1, 2, 3 in the discourse_docker history. I suspect one of these changes is the reason for the hanging postgres issue.
Anyways, finally got the app back up. That was a horrible experience lol
Same issue here when upgrading from 3.3.0 to 3.3.1. The upgrade is stuck at the same log line (database system is ready to accept connections).
Rebooting helps or just killing the upgrade process and ./launcher start app. The new version 3.3.1. is shown. But not sure if this is a sane procedure.
The way I resolved the issue was I had to start a new droplet and do a fresh install, and restore the backup, then the rebuild works fine.
I also have a backup of a working version (which is on a slightly older version), but as soon as I upgraded to the latest version via rebuild, I encountered same issue, so I suspect there is something introduced with recent commit and only breaks old → new version update.
Just some further info to help investigate the issue.
Seeing this on one host running Ubuntu 18.04.6 but another host was also updated today with the same version of Ubuntu and Discourse rebuild progressed normally.
Am going to upgrade Ubuntu on the affected host and see if this helps. Will keep everyone posted.
For those affected, can you ran the command ls -lahn /var/discourse/shared/standalone/ | grep -E "postgres|redis" and let me know if the output differs from the one below?
drwxr-xr-x 2 101 104 4.0K Aug 29 01:33 postgres_backup
drwx------ 19 101 104 4.0K Aug 29 01:42 postgres_data
drwxrwxr-x 3 101 104 4.0K Aug 29 01:42 postgres_run
drwxr-xr-x 2 103 106 4.0K Aug 29 01:38 redis_data
drwxr-xr-x 2 101 104 4.0K Jun 15 2020 postgres_backup
drwx------ 19 101 104 4.0K May 3 2022 postgres_data
drwxrwsr-x 5 101 104 4.0K May 3 2022 postgres_run
drwxr-xr-x 2 103 106 4.0K May 3 2022 redis_data
Just a note, something slightly different in my case.
The rebuild got stuck on The database system is ready to accept connections as others have seen. Had to reboot the VM and run ./launcher start app to start the Forums. However, when Discourse was back up, the Discourse version remained at the previous version 3.3.0.beta4-dev.
Am unable to perform the Ubuntu upgrade today, but keep everyone posted when I can and if the rebuild is then successful.
I did bump our dev instance to Ubuntu 20.6 today and the rebuild / upgrade was successful to Discourse 3.4.0.beta2-dev. However, this was the host that also rebuilt without problems on Ubuntu 18.4 yesterday.