FATAL: the database system is starting up


(Zsolt Ero) #1

I’m getting this on a pure Ubuntu 14.04 LTS host, discourse: stable, when running ./launcher rebuild app. A week ago the same rebuild was working OK, and I didn’t change anything.

Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
Stopping old container
+ /usr/bin/docker stop -t 10 app
app
cd /pups && git pull && /pups/bin/pups --stdin
Already up-to-date.
I, [2016-11-11T18:18:57.221554 #14]  INFO -- : Loading --stdin
I, [2016-11-11T18:18:57.243099 #14]  INFO -- : > mkdir -p /shared/postgres_run
I, [2016-11-11T18:18:57.253021 #14]  INFO -- :
I, [2016-11-11T18:18:57.253878 #14]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2016-11-11T18:18:57.259441 #14]  INFO -- :
I, [2016-11-11T18:18:57.260264 #14]  INFO -- : > chmod 775 /shared/postgres_run
I, [2016-11-11T18:18:57.264970 #14]  INFO -- :
I, [2016-11-11T18:18:57.265770 #14]  INFO -- : > rm -fr /var/run/postgresql
I, [2016-11-11T18:18:57.272968 #14]  INFO -- :
I, [2016-11-11T18:18:57.273810 #14]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2016-11-11T18:18:57.279900 #14]  INFO -- :
I, [2016-11-11T18:18:57.280713 #14]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2016/11/11 18:18:57 socat[22] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): Connection refused
I, [2016-11-11T18:18:57.317145 #14]  INFO -- :
I, [2016-11-11T18:18:57.318117 #14]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2016-11-11T18:18:57.325527 #14]  INFO -- :
I, [2016-11-11T18:18:57.326344 #14]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2016-11-11T18:18:57.332034 #14]  INFO -- :
I, [2016-11-11T18:18:57.332663 #14]  INFO -- : > mkdir -p /shared/postgres_run/9.5-main.pg_stat_tmp
I, [2016-11-11T18:18:57.337157 #14]  INFO -- :
I, [2016-11-11T18:18:57.337919 #14]  INFO -- : > chown postgres:postgres /shared/postgres_run/9.5-main.pg_stat_tmp
I, [2016-11-11T18:18:57.342644 #14]  INFO -- :
I, [2016-11-11T18:18:57.352337 #14]  INFO -- : File > /etc/service/postgres/run  chmod: +x
I, [2016-11-11T18:18:57.361991 #14]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x
I, [2016-11-11T18:18:57.371273 #14]  INFO -- : File > /root/upgrade_postgres  chmod: +x
I, [2016-11-11T18:18:57.372159 #14]  INFO -- : > chown -R root /var/lib/postgresql/9.5/main
I, [2016-11-11T18:18:57.404179 #14]  INFO -- :
I, [2016-11-11T18:18:57.404914 #14]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o postgres -g postgres /shared/postgres_data && sudo -E -u postgres /usr/lib/postgresql/9.5/bin/initdb -D /shared/postgres_data || exit 0
I, [2016-11-11T18:18:57.410299 #14]  INFO -- :
I, [2016-11-11T18:18:57.410700 #14]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2016-11-11T18:18:57.451483 #14]  INFO -- :
I, [2016-11-11T18:18:57.452129 #14]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2016-11-11T18:18:57.456867 #14]  INFO -- :
I, [2016-11-11T18:18:57.457820 #14]  INFO -- : > /root/upgrade_postgres
I, [2016-11-11T18:18:57.468117 #14]  INFO -- :
I, [2016-11-11T18:18:57.468603 #14]  INFO -- : > rm /root/upgrade_postgres
I, [2016-11-11T18:18:57.474099 #14]  INFO -- :
I, [2016-11-11T18:18:57.478068 #14]  INFO -- : Replacing data_directory = '/var/lib/postgresql/9.5/main' with data_directory = '/shared/postgres_data' in /etc/postgresql/9.5/main/postgresql.conf
I, [2016-11-11T18:18:57.481531 #14]  INFO -- : Replacing (?-mix:#?listen_addresses *=.*) with listen_addresses = '*' in /etc/postgresql/9.5/main/postgresql.conf
I, [2016-11-11T18:18:57.528956 #14]  INFO -- : Replacing (?-mix:#?synchronous_commit *=.*) with synchronous_commit = $db_synchronous_commit in /etc/postgresql/9.5/main/postgresql.conf
I, [2016-11-11T18:18:57.529888 #14]  INFO -- : Replacing (?-mix:#?shared_buffers *=.*) with shared_buffers = $db_shared_buffers in /etc/postgresql/9.5/main/postgresql.conf
I, [2016-11-11T18:18:57.530638 #14]  INFO -- : Replacing (?-mix:#?work_mem *=.*) with work_mem = $db_work_mem in /etc/postgresql/9.5/main/postgresql.conf
I, [2016-11-11T18:18:57.531418 #14]  INFO -- : Replacing (?-mix:#?default_text_search_config *=.*) with default_text_search_config = '$db_default_text_search_config' in /etc/postgresql/9.5/main/postgresql.conf
I, [2016-11-11T18:18:57.532335 #14]  INFO -- : > install -d -m 0755 -o postgres -g postgres /shared/postgres_backup
I, [2016-11-11T18:18:57.545152 #14]  INFO -- :
I, [2016-11-11T18:18:57.546038 #14]  INFO -- : Replacing (?-mix:#?max_wal_senders *=.*) with max_wal_senders = $db_max_wal_senders in /etc/postgresql/9.5/main/postgresql.conf
I, [2016-11-11T18:18:57.547022 #14]  INFO -- : Replacing (?-mix:#?wal_level *=.*) with wal_level = $db_wal_level in /etc/postgresql/9.5/main/postgresql.conf
I, [2016-11-11T18:18:57.547899 #14]  INFO -- : Replacing (?-mix:#?checkpoint_segments *=.*) with checkpoint_segments = $db_checkpoint_segments in /etc/postgresql/9.5/main/postgresql.conf
I, [2016-11-11T18:18:57.548779 #14]  INFO -- : Replacing (?-mix:#?logging_collector *=.*) with logging_collector = $db_logging_collector in /etc/postgresql/9.5/main/postgresql.conf
I, [2016-11-11T18:18:57.549680 #14]  INFO -- : Replacing (?-mix:#?log_min_duration_statement *=.*) with log_min_duration_statement = $db_log_min_duration_statement in /etc/postgresql/9.5/main/postgresql.conf
I, [2016-11-11T18:18:57.551932 #14]  INFO -- : Replacing (?-mix:^#local +replication +postgres +peer$) with local replication postgres  peer in /etc/postgresql/9.5/main/pg_hba.conf
I, [2016-11-11T18:18:57.552762 #14]  INFO -- : Replacing (?-mix:^host.*all.*all.*127.*$) with host all all 0.0.0.0/0 md5 in /etc/postgresql/9.5/main/pg_hba.conf
I, [2016-11-11T18:18:57.553524 #14]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/9.5/bin/postmaster -D /etc/postgresql/9.5/main
I, [2016-11-11T18:18:57.557243 #14]  INFO -- : > sleep 5
2016-11-11 18:18:57 UTC [46-1] LOG:  database system was interrupted; last known up at 2016-11-11 18:16:33 UTC
I, [2016-11-11T18:19:02.563644 #14]  INFO -- :
I, [2016-11-11T18:19:02.564064 #14]  INFO -- : > su postgres -c 'createdb discourse' || true
2016-11-11 18:19:03 UTC [50-1] postgres@postgres FATAL:  the database system is starting up
2016-11-11 18:19:03 UTC [51-1] postgres@template1 FATAL:  the database system is starting up
createdb: could not connect to database template1: FATAL:  the database system is starting up
I, [2016-11-11T18:19:03.177231 #14]  INFO -- :
I, [2016-11-11T18:19:03.177722 #14]  INFO -- : > su postgres -c 'psql discourse -c "create user discourse;"' || true
2016-11-11 18:19:03 UTC [62-1] postgres@discourse FATAL:  the database system is starting up
psql: FATAL:  the database system is starting up
I, [2016-11-11T18:19:03.514133 #14]  INFO -- :
I, [2016-11-11T18:19:03.514526 #14]  INFO -- : > su postgres -c 'psql discourse -c "grant all privileges on database discourse to discourse;"' || true
2016-11-11 18:19:03 UTC [73-1] postgres@discourse FATAL:  the database system is starting up
psql: FATAL:  the database system is starting up
I, [2016-11-11T18:19:03.660161 #14]  INFO -- :
I, [2016-11-11T18:19:03.660640 #14]  INFO -- : > su postgres -c 'psql discourse -c "alter schema public owner to discourse;"'
2016-11-11 18:19:03 UTC [84-1] postgres@discourse FATAL:  the database system is starting up
psql: FATAL:  the database system is starting up
I, [2016-11-11T18:19:03.803285 #14]  INFO -- :
I, [2016-11-11T18:19:03.804061 #14]  INFO -- : Terminating async processes
I, [2016-11-11T18:19:03.804231 #14]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/9.5/bin/postmaster -D /etc/postgresql/9.5/main pid: 43
2016-11-11 18:19:03 UTC [43-1] LOG:  received fast shutdown request
I, [2016-11-11T18:19:13.805882 #14]  INFO -- : HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/9.5/bin/postmaster -D /etc/postgresql/9.5/main pid:43 did not terminate cleanly, forcing termination!


FAILED
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' failed with return #<Process::Status: pid 74 exit 2>
Location of failure: /pups/lib/pups/exec_command.rb:108:in `spawn'
exec failed with the params "su postgres -c 'psql $db_name -c \"alter schema public owner to $db_user;\"'"
adc94f56f1d61c4b1384080fe4b5db2481b7bac886d7d11cac4f06f6109a6301
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one

(Zsolt Ero) #2

Anyone who can help me how to diagnose this?

This is a clean Ubuntu 14.04 LTS host, it has nothing, except for the Discourse forum, which I installed by the official 30 min tutorial.

It was working perfectly for the last 8 months.

Then I had the mini_racer upgrade bug, which I fixed by rebuilding to 1.6.6.

Then I stopped receiving emails, so I tried to rebuild again and this is what I got.

I still cannot rebuild, and my forum is still down. How is it possible that a rebuild cannot progress on an official supported setup? I don’t have any customisation or plugins.

Since I tried a new launcher cleanup and launcher rebuild, but I’m getting the same error:


(Zsolt Ero) #3

I think the problem might be that my host is using devicemapper. However I found this thread and this Github commit linked from it:


Yet, on my system it is not working, I get no warning or error at all from launcher.

Here is my docker info 2> /dev/null | grep 'Driver: ':

Storage Driver: devicemapper
Logging Driver: json-file
Cgroup Driver: cgroupfs

(Zsolt Ero) #4

OK, I found the following:

  • I was using devicemapper, and there is a bug in the launcher code which would detect it
  • Reinstalled on a new VPS. Using Ubuntu 16.04 LTS this time. Docker driver is aufs!
  • Copied over the standalone folder.
  • Run rebuild.
  • Same error!

Full log here:

I believe that sleep 5 line might be very small, and need to be changed either to a higher value, or a while loop which detects if postgres is up.


(Rafael dos Santos Silva) #5

Try increasing the sleep in templates/postgres.template.yml locally. Is your server slow? Database too big?


(Zsolt Ero) #6

I found out what happened. The database wasn’t shut down cleanly, probably because of the previous devicemapper setup. In this case the database definitely needs much longer than 5 seconds to start up.

Since an unclean shutdown can happen to anyone, I’ve submitted this:
https://github.com/discourse/discourse_docker/issues/309

I’m trying out to modify the sleep value.


LOG: database system is shut down
(Zsolt Ero) #7

Yes, with sleep 60 it worked well.


(Jeff Atwood) #8

Hmm @sam we need to fix this code that is not detecting devicemapper and blocking install…


(Sam Saffron) #9

This detection code is now fixed per @mpalmer’s change


(Sam Saffron) #10