PostgreSQL Stuck During Rebuilding

Hello everyone,

I’m running into an issue with PostgreSQL startup issue when I tried to rebuild my app, and I’m hoping to get some help.

Here’s the log, it has been stuck on this status for more than 30 mins.

Status: Image is up to date for discourse/base:2.0.20240825-0027
docker.io/discourse/base:2.0.20240825-0027
/usr/local/lib/ruby/gems/3.3.0/gems/pups-1.2.1/lib/pups.rb
/usr/local/bin/pups --stdin
I, [2024-08-26T17:16:15.344712 #1]  INFO -- : Reading from stdin
I, [2024-08-26T17:16:15.357924 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2024-08-26T17:16:15.362740 #1]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x  chown:
I, [2024-08-26T17:16:15.367767 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2024-08-26T17:16:15.372845 #1]  INFO -- : File > /root/install_postgres  chmod: +x  chown:
I, [2024-08-26T17:16:15.377501 #1]  INFO -- : File > /root/upgrade_postgres  chmod: +x  chown:
I, [2024-08-26T17:16:15.377876 #1]  INFO -- : Replacing data_directory = '/var/lib/postgresql/13/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.378854 #1]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.379386 #1]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.379835 #1]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.380263 #1]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.380761 #1]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.381203 #1]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.381901 #1]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.382352 #1]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/13/main/postgresql.conf
I, [2024-08-26T17:16:15.382802 #1]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres  peer in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-26T17:16:15.383231 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-26T17:16:15.383604 #1]  INFO -- : Replacing (?-mix:^host.*all.*all.*::1\/128.*$) with host all all ::/0 md5 in /etc/postgresql/13/main/pg_hba.conf
I, [2024-08-26T17:16:15.384079 #1]  INFO -- : > if [ -f /root/install_postgres ]; then
  /root/install_postgres && rm -f /root/install_postgres
elif [ -e /shared/postgres_run/.s.PGSQL.5432 ]; then
  socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
fi

2024/08/26 17:16:15 socat[28] E connect(, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): Connection refused
I, [2024-08-26T17:16:15.452500 #1]  INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2024-08-26T17:16:15.453058 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main
I, [2024-08-26T17:16:15.455944 #1]  INFO -- : Terminating async processes
2024-08-26 17:16:15.500 UTC [30] LOG:  starting PostgreSQL 13.16 (Debian 13.16-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-08-26 17:16:15.501 UTC [30] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-26 17:16:15.501 UTC [30] LOG:  listening on IPv6 address "::", port 5432
2024-08-26 17:16:15.507 UTC [30] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-26 17:16:15.516 UTC [31] LOG:  database system was interrupted; last known up at 2024-08-26 17:10:28 UTC
2024-08-26 17:16:15.769 UTC [31] LOG:  database system was not properly shut down; automatic recovery in progress
2024-08-26 17:16:15.774 UTC [31] LOG:  redo starts at 18F/E62D1458
2024-08-26 17:16:15.774 UTC [31] LOG:  invalid record length at 18F/E62D1490: wanted 24, got 0
2024-08-26 17:16:15.774 UTC [31] LOG:  redo done at 18F/E62D1458
2024-08-26 17:16:15.809 UTC [30] LOG:  database system is ready to accept connections```

So it didnt’ get shutdown cleanly and tried to fix things, and it thinks it did.

Maybe control-c out of it and see if you can ./launcher start app to get the old container running again.

If it works, then you can try again to ./launcher stop app and then rebuild.

3 Likes

I have been running into the same thing trying to rebuild the last few days. I cannot get Discourse to run or rebuild without issue.

I tried using the start/stop capability and it didn’t seem to work. The VM itself has been rebooted a couple of times, as well. It just keeps hanging at that line about the database ready to accept connections.

Control-C didn’t work and I tried many different things, including revert back to old versions, but as soon as I tried the rebuild, it got stuck in the exact same place.

How much ram do you have? Is your network connection slow?

for my issue…plenty of RAM…8gb. Network is fine

4GB RAM, I checked network, disk usage, cpu usage, ram usage, everything looks ok.

I managed to make more progress. In /var/discourse/ on my server, I checked out commit b1108913820edd27f869634d0fc654639758889a. This commit is from a few days ago, and does not have these three commits (1, 2, 3 in the discourse_docker history. I suspect one of these changes is the reason for the hanging postgres issue.

Anyways, finally got the app back up. That was a horrible experience lol

3 Likes

Same issue here when upgrading from 3.3.0 to 3.3.1. The upgrade is stuck at the same log line (database system is ready to accept connections).

Rebooting helps or just killing the upgrade process and ./launcher start app. The new version 3.3.1. is shown. But not sure if this is a sane procedure.

So that’s four people with a problem, I think.

Are those of you who had trouble on ARM or on Intel?

1 Like

That’s a great question.

I just did a fresh install on a new Digital Ocean VM and then ran a rebuild and it worked just fine.

I’m using Intel.

The way I resolved the issue was I had to start a new droplet and do a fresh install, and restore the backup, then the rebuild works fine.

I also have a backup of a working version (which is on a slightly older version), but as soon as I upgraded to the latest version via rebuild, I encountered same issue, so I suspect there is something introduced with recent commit and only breaks old → new version update.

1 Like

Darn.

Hmm. I’ll see if I’ve got a site that I don’t care if it goes down.

I guess you’ve got a standard 1-container standard install. I’ll see if I can find one of those.

Just bumping this as have also seen this issue since the above commit. Tried all the above too to resolve issue.

2 Likes

x86. I am on ubuntu bionic for host OS…perhaps that is of consequence. Not sure what others’ OS is

It is over a year beyond EOL. Ubuntu 18.04 EOL – keep your fleet of devices up and running | Ubuntu.

It’s not too soon to spin up a new VM and move there.

4 Likes

Just some further info to help investigate the issue.

Seeing this on one host running Ubuntu 18.04.6 but another host was also updated today with the same version of Ubuntu and Discourse rebuild progressed normally.

Am going to upgrade Ubuntu on the affected host and see if this helps. Will keep everyone posted.

2 Likes

For those affected, can you ran the command ls -lahn /var/discourse/shared/standalone/ | grep -E "postgres|redis" and let me know if the output differs from the one below?

drwxr-xr-x  2  101 104 4.0K Aug 29 01:33 postgres_backup
drwx------ 19  101 104 4.0K Aug 29 01:42 postgres_data
drwxrwxr-x  3  101 104 4.0K Aug 29 01:42 postgres_run
drwxr-xr-x  2  103 106 4.0K Aug 29 01:38 redis_data
1 Like
# ls -lahn /var/discourse/shared/standalone/ | grep -E "postgres|redis" 
drwxr-xr-x  2  101 104 4.0K Dec 26  2019 postgres_backup
drwx------ 19  101 104 4.0K Aug 28 03:59 postgres_data
drwxrwxr-x  5  101 104 4.0K Aug 28 03:59 postgres_run
drwxr-xr-x  2  103 106 4.0K Aug 29 03:59 redis_data
2 Likes

Output from VM experiencing rebuild issue:

drwxr-xr-x  2  101 104 4.0K Jun 15  2020 postgres_backup
drwx------ 19  101 104 4.0K May  3  2022 postgres_data
drwxrwsr-x  5  101 104 4.0K May  3  2022 postgres_run
drwxr-xr-x  2  103 106 4.0K May  3  2022 redis_data

Just a note, something slightly different in my case.
The rebuild got stuck on The database system is ready to accept connections as others have seen. Had to reboot the VM and run ./launcher start app to start the Forums. However, when Discourse was back up, the Discourse version remained at the previous version 3.3.0.beta4-dev.

Am unable to perform the Ubuntu upgrade today, but keep everyone posted when I can and if the rebuild is then successful.

I did bump our dev instance to Ubuntu 20.6 today and the rebuild / upgrade was successful to Discourse 3.4.0.beta2-dev. However, this was the host that also rebuilt without problems on Ubuntu 18.4 yesterday.