Upgrade of Postgres failed due to corrupted database?


(Dabaer) #1

I seem to have a non-space related issue upgrading here. My log is as follows:

Unable to find image 'discourse/base:2.0.20180613' locally
2.0.20180613: Pulling from discourse/base
b234f539f7a1: Pulling fs layer
55172d420b43: Pulling fs layer
5ba5bbeb6b91: Pulling fs layer
43ae2841ad7a: Pulling fs layer
f6c9c6de4190: Pulling fs layer
454ac42e51a9: Pulling fs layer
f6c9c6de4190: Waiting
454ac42e51a9: Waiting
43ae2841ad7a: Waiting
55172d420b43: Verifying Checksum
55172d420b43: Download complete
5ba5bbeb6b91: Download complete
43ae2841ad7a: Download complete
f6c9c6de4190: Verifying Checksum
f6c9c6de4190: Download complete
b234f539f7a1: Verifying Checksum
b234f539f7a1: Download complete
b234f539f7a1: Pull complete
55172d420b43: Pull complete
5ba5bbeb6b91: Pull complete
43ae2841ad7a: Pull complete
f6c9c6de4190: Pull complete
454ac42e51a9: Download complete
454ac42e51a9: Pull complete
Digest: sha256:c5cb70244978e0cd95a2e56b3d28668618318e5ef3ffa96ecf3347c1261e2e42
Status: Downloaded newer image for discourse/base:2.0.20180613
Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
Stopping old container
+ /usr/bin/docker stop -t 10 yhr
yhr
cd /pups && git pull && /pups/bin/pups --stdin
From https://github.com/discourse/pups
   7bde3d3..d1cdc3f  master     -> origin/master
 * [new tag]         v1.0.2     -> v1.0.2
Updating 7bde3d3..d1cdc3f
Fast-forward
 lib/pups/exec_command.rb | 6 +++++-
 lib/pups/version.rb      | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)
I, [2018-07-17T00:27:13.745447 #19]  INFO -- : Loading --stdin
I, [2018-07-17T00:27:13.752134 #19]  INFO -- : > locale-gen $LANG && update-locale
I, [2018-07-17T00:27:14.581853 #19]  INFO -- : Generating locales (this might take a while)...
  en_US.UTF-8... done
Generation complete.

I, [2018-07-17T00:27:14.582187 #19]  INFO -- : > mkdir -p /shared/postgres_run
I, [2018-07-17T00:27:14.586036 #19]  INFO -- : 
I, [2018-07-17T00:27:14.586230 #19]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2018-07-17T00:27:14.589849 #19]  INFO -- : 
I, [2018-07-17T00:27:14.590025 #19]  INFO -- : > chmod 775 /shared/postgres_run
I, [2018-07-17T00:27:14.592851 #19]  INFO -- : 
I, [2018-07-17T00:27:14.593027 #19]  INFO -- : > rm -fr /var/run/postgresql
I, [2018-07-17T00:27:14.596176 #19]  INFO -- : 
I, [2018-07-17T00:27:14.596337 #19]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2018-07-17T00:27:14.600005 #19]  INFO -- : 
I, [2018-07-17T00:27:14.600202 #19]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2018/07/17 00:27:14 socat[51] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): No such file or directory
I, [2018-07-17T00:27:14.606943 #19]  INFO -- : 
I, [2018-07-17T00:27:14.607118 #19]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2018-07-17T00:27:14.610526 #19]  INFO -- : 
I, [2018-07-17T00:27:14.610711 #19]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2018-07-17T00:27:14.614261 #19]  INFO -- : 
I, [2018-07-17T00:27:14.614426 #19]  INFO -- : > mkdir -p /shared/postgres_run/10-main.pg_stat_tmp
I, [2018-07-17T00:27:14.618430 #19]  INFO -- : 
I, [2018-07-17T00:27:14.618596 #19]  INFO -- : > chown postgres:postgres /shared/postgres_run/10-main.pg_stat_tmp
I, [2018-07-17T00:27:14.621538 #19]  INFO -- : 
I, [2018-07-17T00:27:14.627497 #19]  INFO -- : File > /etc/service/postgres/run  chmod: +x
I, [2018-07-17T00:27:14.632984 #19]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x
I, [2018-07-17T00:27:14.638965 #19]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x
I, [2018-07-17T00:27:14.645220 #19]  INFO -- : File > /root/upgrade_postgres  chmod: +x
I, [2018-07-17T00:27:14.645479 #19]  INFO -- : > chown -R root /var/lib/postgresql/10/main
I, [2018-07-17T00:27:14.916011 #19]  INFO -- : 
I, [2018-07-17T00:27:14.916315 #19]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o postgres -g postgres /shared/postgres_data && sudo -E -u postgres /usr/lib/postgresql/10/bin/initdb -D /shared/postgres_data || exit 0
I, [2018-07-17T00:27:14.919443 #19]  INFO -- : 
I, [2018-07-17T00:27:14.919562 #19]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2018-07-17T00:27:15.000192 #19]  INFO -- : 
I, [2018-07-17T00:27:15.000367 #19]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2018-07-17T00:27:15.004219 #19]  INFO -- : 
I, [2018-07-17T00:27:15.004456 #19]  INFO -- : > /root/upgrade_postgres

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
dpkg-preconfigure: unable to re-open stdin: 
E: Invalid operation instatll
I, [2018-07-17T00:27:41.899557 #19]  INFO -- : Upgrading PostgreSQL from version 9.5 to 10
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /shared/postgres_data_new ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    /usr/lib/postgresql/10/bin/pg_ctl -D /shared/postgres_data_new -l logfile start

Hit:1 https://deb.nodesource.com/node_8.x xenial InRelease
Hit:2 http://archive.ubuntu.com/ubuntu xenial InRelease
Get:3 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Get:4 http://apt.postgresql.org/pub/repos/apt xenial-pgdg InRelease [51.4 kB]
Get:5 http://archive.ubuntu.com/ubuntu xenial-security InRelease [107 kB]
Get:6 http://apt.postgresql.org/pub/repos/apt xenial-pgdg/main amd64 Packages [176 kB]
Get:7 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [1,042 kB]
Get:8 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [827 kB]
Get:9 http://archive.ubuntu.com/ubuntu xenial-security/main amd64 Packages [664 kB]
Get:10 http://archive.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [453 kB]
Fetched 3,430 kB in 2s (1,211 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  postgresql-client-9.5 postgresql-contrib-9.5
Suggested packages:
  locales-all postgresql-doc-9.5 libdbd-pg-perl
The following NEW packages will be installed:
  postgresql-9.5 postgresql-client-9.5 postgresql-contrib-9.5
0 upgraded, 3 newly installed, 0 to remove and 38 not upgraded.
Need to get 5,610 kB of archives.
After this operation, 26.3 MB of additional disk space will be used.
Get:1 http://apt.postgresql.org/pub/repos/apt xenial-pgdg/main amd64 postgresql-client-9.5 amd64 9.5.13-2.pgdg16.04+1 [1,194 kB]
Get:2 http://apt.postgresql.org/pub/repos/apt xenial-pgdg/main amd64 postgresql-9.5 amd64 9.5.13-2.pgdg16.04+1 [3,961 kB]
Get:3 http://apt.postgresql.org/pub/repos/apt xenial-pgdg/main amd64 postgresql-contrib-9.5 amd64 9.5.13-2.pgdg16.04+1 [455 kB]
Fetched 5,610 kB in 4s (1,297 kB/s)
Selecting previously unselected package postgresql-client-9.5.
(Reading database ... 36351 files and directories currently installed.)
Preparing to unpack .../postgresql-client-9.5_9.5.13-2.pgdg16.04+1_amd64.deb ...
Unpacking postgresql-client-9.5 (9.5.13-2.pgdg16.04+1) ...
Selecting previously unselected package postgresql-9.5.
Preparing to unpack .../postgresql-9.5_9.5.13-2.pgdg16.04+1_amd64.deb ...
Unpacking postgresql-9.5 (9.5.13-2.pgdg16.04+1) ...
Selecting previously unselected package postgresql-contrib-9.5.
Preparing to unpack .../postgresql-contrib-9.5_9.5.13-2.pgdg16.04+1_amd64.deb ...
Unpacking postgresql-contrib-9.5 (9.5.13-2.pgdg16.04+1) ...
Processing triggers for postgresql-common (191.pgdg16.04+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
Setting up postgresql-client-9.5 (9.5.13-2.pgdg16.04+1) ...
update-alternatives: warning: forcing reinstallation of alternative /usr/share/postgresql/10/man/man1/psql.1.gz because link group psql.1.gz is broken
Setting up postgresql-9.5 (9.5.13-2.pgdg16.04+1) ...
Creating new PostgreSQL cluster 9.5/main ...
/usr/lib/postgresql/9.5/bin/initdb -D /var/lib/postgresql/9.5/main --auth-local peer --auth-host md5
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/9.5/main ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /var/lib/postgresql/9.5/main/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    /usr/lib/postgresql/9.5/bin/pg_ctl -D /var/lib/postgresql/9.5/main -l logfile start

Ver Cluster Port Status Owner    Data directory               Log file
9.5 main    5433 down   postgres /var/lib/postgresql/9.5/main /var/log/postgresql/postgresql-9.5-main.log
update-alternatives: warning: forcing reinstallation of alternative /usr/share/postgresql/10/man/man1/postmaster.1.gz because link group postmaster.1.gz is broken
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Setting up postgresql-contrib-9.5 (9.5.13-2.pgdg16.04+1) ...
 * Stopping PostgreSQL 10 database server
   ...done.
 * Stopping PostgreSQL 9.5 database server
   ...done.
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for invalid "unknown" user columns                 ok
Checking for roles starting with "pg_"                      ok
Creating dump of global objects                             ok
Creating dump of database schemas
  discourse

*failure*

Consult the last few lines of "pg_upgrade_dump_16400.log" for
the probable cause of the failure.
Failure, exiting
-------------------------------------------------------------------------------------
UPGRADE OF POSTGRES FAILED

You are going to need to export your data and import into a clean instance:

In containers/app.yml: Change "templates/postgres.template.yml" TO "templates/postgres.9.5.template.yml"

Run ./launcher rebuild app again

When your instance is running:
Run ./launcher enter app
Run apt-get remove postgresql-client-9.5
Run cd /shared/postgres_backup && sudo -u postgres pg_dump discourse > backup.db

Undo the base_image in your container config
Run: ./launcher stop app
Run: sudo mv /var/discourse/shared/standalone/postgres_data /var/discourse/shared/standalone/postgres_data_old
Run: ./launcher rebuild app

Run: ./launcher enter app
Run: cd /shared/postgres_backup
Run: sv stop unicorn
Run: sudo -iu postgres dropdb discourse
Run: sudo -iu postgres createdb discourse
Run: sudo -iu postgres psql discourse < backup.db
Run: exit
Run: ./launcher rebuild app



FAILED
--------------------
Pups::ExecError: /root/upgrade_postgres failed with return #<Process::Status: pid 70 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "/root/upgrade_postgres"
3004c4e45c7fdf648f83f10b8ffb9aaa5a7a8b2dcf6bfb0e92a42a05cb7c3375
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one

Following the instructions provided resulted in:

Warning: No existing local cluster is suitable as a default target. Please see man pg_wrapper(1) how to specify one.
Error: You must install at least one postgresql-client-<version> package

Installing the pg10 client client resulted in:

pg_dump: [archiver (db)] query failed: ERROR:  invalid page in block 0 of relation base/16400/193867
pg_dump: [archiver (db)] query was: SELECT 'bigint' AS sequence_type, start_value, increment_by, max_value, min_value, cache_value, is_cycled FROM public.tag_groups_id_seq

Not sure how to fix this if I can’t even backup the database.


Docker base image with PostgreSQL 10 released
(Matt Palmer) #2

I’m thinking your database is probably corrupt in some way. How’s your automated backups?


(Dabaer) #3

So which part of the upgrader corrupted my database? All I have are the discourse automatic backups.


(Alan Tan) #4

In /var/docker/shared/standalone, you should see a folder postgres_data_old. Do you still have that?


(Matt Palmer) #5

I doubt the upgrader corrupted your database, given that it has worked fine for many, many other people. Far more likely is that you had filesystem or memory corruption at some point.


(Dabaer) #6

Yep! Can I migrate that to 10 somehow as is?


(Alan Tan) #7

Hmm at which step did this fail?

UPGRADE OF POSTGRES FAILED

You are going to need to export your data and import into a clean instance:

In containers/app.yml: Change “templates/postgres.template.yml” TO “templates/postgres.9.5.template.yml”

Run ./launcher rebuild app again

When your instance is running:
Run ./launcher enter app
Run apt-get remove postgresql-client-9.5
Run cd /shared/postgres_backup && sudo -u postgres pg_dump discourse > backup.db

Undo the base_image in your container config
Run: ./launcher stop app
Run: sudo mv /var/discourse/shared/standalone/postgres_data /var/discourse/shared/standalone/postgres_data_old
Run: ./launcher rebuild app

Run: ./launcher enter app
Run: cd /shared/postgres_backup
Run: sv stop unicorn
Run: sudo -iu postgres dropdb discourse
Run: sudo -iu postgres createdb discourse
Run: sudo -iu postgres psql discourse < backup.db
Run: exit
Run: ./launcher rebuild app

(Dabaer) #8

Creating dump of database schemas
discourse

failure

I assume for the same reason manually doing it failed.


(Dabaer) #9

Sorry I misread. It failed at Run cd /shared/postgres_backup && sudo -u postgres pg_dump discourse > backup.db.


(Alan Tan) #10

This is weird because if your upgrade failed… you shouldn’t have the postgres_data_old folder. Can you provide us with the output of the following commands:

ls /var/docker/shared/standalone/

(Dabaer) #11

The directory doesn’t exist, on the host or in the container


(Dabaer) #12

Oh I guess you mean’t /var/discourse - in that case it has these:

backups postgres_backup postgres_run state
log postgres_data redis_data uploads


(Alan Tan) #13

Can you PM me with your app.yml file and censored any credentials in it?