Upgrade Failed from 2.7.0.beta1 to 2.7.0.beta3

I used the one-click browser upgrade to attempt to upgrade from 2.7.0.beta1 to 2.7.0.beta3.

It first updated docker, apparently successfully. Then, as instructed, I executed the following on the server:

    cd /var/discourse
    git pull
    ./launcher rebuild app

It completed and said I needed to rebuild again. So I did, and it got pretty far before displaying the following:

I, [2021-02-01T04:03:23.848858 #1]  INFO -- : > HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main
I, [2021-02-01T04:03:23.850125 #1]  INFO -- : > sleep 5
I, [2021-02-01T04:03:28.854186 #1]  INFO -- :
I, [2021-02-01T04:03:28.854378 #1]  INFO -- : > su postgres -c 'createdb discourse' || true
createdb: error: could not connect to database template1: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2021-02-01T04:03:28.940422 #1]  INFO -- :
I, [2021-02-01T04:03:28.940926 #1]  INFO -- : > su postgres -c 'psql discourse -c "create user discourse;"' || true
psql: error: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2021-02-01T04:03:29.005802 #1]  INFO -- :
I, [2021-02-01T04:03:29.006192 #1]  INFO -- : > su postgres -c 'psql discourse -c "grant all privileges on database discourse to discourse;"' || true
psql: error: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2021-02-01T04:03:29.055155 #1]  INFO -- :
I, [2021-02-01T04:03:29.055530 #1]  INFO -- : > su postgres -c 'psql discourse -c "alter schema public owner to discourse;"'
psql: error: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I, [2021-02-01T04:03:29.102737 #1]  INFO -- :
I, [2021-02-01T04:03:29.103136 #1]  INFO -- : Terminating async processes
I, [2021-02-01T04:03:29.103280 #1]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main pid: 52


FAILED
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' failed with return #<Process::Status: pid 78 exit 2>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "su postgres -c 'psql $db_name -c \"alter schema public owner to $db_user;\"'"
74718f22e5eb9e1ceb21ac2a2fe613d13aee282a353cf60b91258ba2b2323397
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

The release notes warned about postgres and disk space; maybe it failed for that reason? When I run discourse-doctor, the output includes:

---------- OS Disk Space ----------
Filesystem                 Size  Used Avail Use% Mounted on
/dev/disk/by-label/DOROOT   30G   20G  8.5G  70% /

What should I do now?

Can you paste more lines of the rebuild output? We need some lines earlier than that in order to be able to troubleshoot.

It looks like the culprit may be:

    I, [2021-02-01T22:13:56.638190 #1]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate'
2021-02-01 22:14:05.011 UTC [4123] discourse@discourse ERROR:  duplicate key value violates unique constraint "index_users_on_username"
2021-02-01 22:14:05.011 UTC [4123] discourse@discourse DETAIL:  Key (username)=(Pxxx_Gxxxxxxxx) already exists.
2021-02-01 22:14:05.011 UTC [4123] discourse@discourse STATEMENT:  UPDATE users
        SET locale = 'en'
        WHERE locale = 'en_US'

rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "index_users_on_username"
DETAIL:  Key (username)=(Pxxx_Gxxxxxxxx) already exists.

If so, how would you suggest fixing the problem? The full output is attached.

failed_discourse_upgrade_2021_01_31.txt (90.4 KB)

1 Like

There have been a few recent topics about fixing the duplicates. Example:

2 Likes

Thanks. I am having trouble gettting psql to work.

I restarted the container:

./launcher start app

Then I executed:

./launcher enter app

then:

su postgres -c 'psql discourse'

But the output was:

psql: error: could not connect to server: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket “/var/run/postgresql/.s.PGSQL.5432”?

Try sudo -u postgres psql discourse

Try sudo -u postgres psql discourse

Same error message, unfortunately.

1 Like

Did it run through the first iteration of launcher rebuild OK?

If it did, your database was already upgraded to Psql 13, while your Psql binaries in the old app image still expect a database version 10 or 12, depending on what you’re upgrading from.

cd /var/discourse/shared/standalone
ls -alh 

Do you have both postgres_data and postgres_data_old directories?

Make sure the app is stopped, then move or rename (don’t delete just yet, just to be safe) the postgres_data dir, then

mv postgres_data_old postgres_data

And then try launcher start app again.

Hope this helps!

Gunnar

2 Likes

I followed these steps and successfully renamed the duplicate user. But then I encountered this issue when trying to re-index.

@sam had suggested in that thread: “Yes please nuke the dupe rows”, but I don’t know what that means in the context of the warnings and error messages I get when re-indexing, which has now mushroomed into:

REINDEX SCHEMA CONCURRENTLY public;
WARNING:  cannot reindex invalid index "public.index_incoming_referers_on_path_and_incoming_domain_id_ccnew" concurrently, skipping
WARNING:  cannot reindex invalid index "public.incoming_referers_pkey_ccnew1" concurrently, skipping
WARNING:  cannot reindex invalid index "public.index_incoming_referers_on_path_and_incoming_domain_id_ccnew1" concurrently, skipping
WARNING:  cannot reindex invalid index "public.index_incoming_referers_on_path_and_incoming_domain_id_cc_ccnew" concurrently, skipping
WARNING:  cannot reindex invalid index "public.index_incoming_referers_on_path_and_incoming_domain_id_c_ccnew1" concurrently, skipping
WARNING:  cannot reindex invalid index "public.incoming_referers_pkey_ccnew2" concurrently, skipping
WARNING:  cannot reindex invalid index "public.index_incoming_referers_on_path_and_incoming_domain_id_ccnew2" concurrently, skipping
WARNING:  cannot reindex invalid index "public.incoming_referers_pkey_ccnew_ccnew" concurrently, skipping
WARNING:  cannot reindex invalid index "pg_toast.pg_toast_19250_index_ccnew1" concurrently, skipping
WARNING:  cannot reindex invalid index "pg_toast.pg_toast_19250_index_ccnew2" concurrently, skipping
WARNING:  cannot reindex invalid index "pg_toast.pg_toast_19250_index_ccnew_ccnew" concurrently, skipping
ERROR:  could not create unique index "index_incoming_referers_on_path_and_incoming_domain_id_ccnew3"
DETAIL:  Key (path, incoming_domain_id)=(/votes/, 1165) is duplicated.
1 Like

Those are temporary indexes that were created while the reindexing was working. Every time it crashed due to duplicates, it would leave at least one temporary index behind. You can tell by the name, which ends in ccnew, ccnew1, ccnew2, and so on.

You can get rid of them by entering Psql and issuing the DROP INDEX command.

sudo ./launcher enter app
su postgres -c 'psql discourse'

DROP index ‘<index name>_ccnew’;
DROP index ‘<index name>_ccnew1’;

And so on. Make sure you have a backup of the database first, make sure reindex isn’t currently running, and make sure that you only drop the _ccnew indices.

More information in this post:

3 Likes

Thanks again, Gunnar. I was able to drop some of the ccnew indexes but not all:

discourse=# DROP INDEX public.index_incoming_referers_on_path_and_incoming_domain_id_ccnew;
DROP INDEX
discourse=# DROP INDEX public.incoming_referers_pkey_ccnew1;
DROP INDEX
discourse=# DROP INDEX public.index_incoming_referers_on_path_and_incoming_domain_id_ccnew1;
DROP INDEX
discourse=# DROP INDEX public.index_incoming_referers_on_path_and_incoming_domain_id_cc_ccnew;
ERROR:  index "index_incoming_referers_on_path_and_incoming_domain_id_cc_ccnew" does not exist
discourse=# DROP INDEX public.index_incoming_referers_on_path_and_incoming_domain_id_c_ccnew1;
ERROR:  index "index_incoming_referers_on_path_and_incoming_domain_id_c_ccnew1" does not exist
discourse=# DROP INDEX public.incoming_referers_pkey_ccnew2;
DROP INDEX
discourse=# DROP INDEX public.incoming_referers_pkey_ccnew_ccnew;
ERROR:  index "incoming_referers_pkey_ccnew_ccnew" does not exist
discourse=# DROP INDEX pg_toast.pg_toast_19250_index_ccnew1;
ERROR:  permission denied: "pg_toast_19250_index_ccnew1" is a system catalog
discourse=# DROP INDEX pg_toast.pg_toast_19250_index_ccnew2;
ERROR:  permission denied: "pg_toast_19250_index_ccnew2" is a system catalog
discourse=# DROP INDEX pg_toast.pg_toast_19250_index_ccnew_ccnew;
ERROR:  permission denied: "pg_toast_19250_index_ccnew_ccnew" is a system catalog

In any case, I seem to have successfully reindexed afterwards:

discourse=# REINDEX SCHEMA CONCURRENTLY public;
REINDEX

So then I went to complete the upgrade, but it failed:

root@forum:/var/discourse# ./launcher rebuild app
Ensuring launcher is up to date
Fetching origin
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/discourse/discourse_docker
 * [new branch]      fix-prune-time -> origin/fix-prune-time
Launcher is up-to-date
Stopping old container
+ /usr/bin/docker stop -t 60 app
app
cd /pups && git pull && /pups/bin/pups --stdin
Already up to date.
I, [2021-02-15T00:34:30.967636 #1]  INFO -- : Loading --stdin
I, [2021-02-15T00:34:30.973572 #1]  INFO -- : > locale-gen $LANG && update-locale
I, [2021-02-15T00:34:31.024271 #1]  INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2021-02-15T00:34:31.024803 #1]  INFO -- : > mkdir -p /shared/postgres_run
I, [2021-02-15T00:34:31.029795 #1]  INFO -- :
I, [2021-02-15T00:34:31.030826 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2021-02-15T00:34:31.033498 #1]  INFO -- :
I, [2021-02-15T00:34:31.033875 #1]  INFO -- : > chmod 775 /shared/postgres_run
I, [2021-02-15T00:34:31.036104 #1]  INFO -- :
I, [2021-02-15T00:34:31.036435 #1]  INFO -- : > rm -fr /var/run/postgresql
I, [2021-02-15T00:34:31.038583 #1]  INFO -- :
I, [2021-02-15T00:34:31.038915 #1]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2021-02-15T00:34:31.041198 #1]  INFO -- :
I, [2021-02-15T00:34:31.041511 #1]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2021/02/15 00:34:31 socat[27] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): No such file or directory
I, [2021-02-15T00:34:31.055279 #1]  INFO -- :
I, [2021-02-15T00:34:31.055620 #1]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2021-02-15T00:34:31.058156 #1]  INFO -- :
I, [2021-02-15T00:34:31.058442 #1]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2021-02-15T00:34:31.060461 #1]  INFO -- :
I, [2021-02-15T00:34:31.060758 #1]  INFO -- : > mkdir -p /shared/postgres_run/13-main.pg_stat_tmp
I, [2021-02-15T00:34:31.062949 #1]  INFO -- :
I, [2021-02-15T00:34:31.063384 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run/13-main.pg_stat_tmp
I, [2021-02-15T00:34:31.065117 #1]  INFO -- :
I, [2021-02-15T00:34:31.069700 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2021-02-15T00:34:31.073080 #1]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x  chown:
I, [2021-02-15T00:34:31.076629 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2021-02-15T00:34:31.079978 #1]  INFO -- : File > /root/upgrade_postgres  chmod: +x  chown:
I, [2021-02-15T00:34:31.080365 #1]  INFO -- : > chown -R root /var/lib/postgresql/13/main
I, [2021-02-15T00:34:31.456272 #1]  INFO -- :
I, [2021-02-15T00:34:31.456523 #1]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o postgres -g postgres /shared/postgres_data && sudo -E -u postgres /usr/lib/postgresql/13/bin/initdb -D /shared/postgres_data || exit 0
I, [2021-02-15T00:34:31.458416 #1]  INFO -- :
I, [2021-02-15T00:34:31.458635 #1]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2021-02-15T00:34:31.489118 #1]  INFO -- :
I, [2021-02-15T00:34:31.489681 #1]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2021-02-15T00:34:31.491900 #1]  INFO -- :
I, [2021-02-15T00:34:31.492294 #1]  INFO -- : > /root/upgrade_postgres
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
debconf: delaying package configuration, since apt-utils is not installed
I, [2021-02-15T00:34:44.948743 #1]  INFO -- : Upgrading PostgreSQL from version 12 to 13
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /shared/postgres_data_new ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok


Success. You can now start the database server using:

    /usr/lib/postgresql/13/bin/pg_ctl -D /shared/postgres_data_new -l logfile start

Get:1 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:2 http://deb.debian.org/debian buster InRelease [122 kB]
Get:3 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:4 http://apt.postgresql.org/pub/repos/apt buster-pgdg InRelease [104 kB]
Get:5 http://security.debian.org/debian-security buster/updates/main amd64 Packages [267 kB]
Get:6 http://deb.debian.org/debian buster/main amd64 Packages [7,907 kB]
Get:7 http://deb.debian.org/debian buster-updates/main amd64 Packages.diff/Index [5,656 B]
Get:8 http://deb.debian.org/debian buster-updates/main amd64 Packages 2020-12-24-1401.30.pdiff [286 B]
Get:9 http://deb.debian.org/debian buster-updates/main amd64 Packages 2021-01-29-2000.47.pdiff [408 B]
Get:10 http://deb.debian.org/debian buster-updates/main amd64 Packages 2021-02-07-1359.56.pdiff [2,302 B]
Get:10 http://deb.debian.org/debian buster-updates/main amd64 Packages 2021-02-07-1359.56.pdiff [2,302 B]
Get:11 https://deb.nodesource.com/node_10.x buster InRelease [4,584 B]
Get:12 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 Packages [216 kB]
Get:13 https://deb.nodesource.com/node_10.x buster/main amd64 Packages [768 B]
Fetched 8,746 kB in 2s (4,421 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  postgresql-client-12
Suggested packages:
  postgresql-doc-12
The following NEW packages will be installed:
  postgresql-12 postgresql-client-12
0 upgraded, 2 newly installed, 0 to remove and 28 not upgraded.
Need to get 16.1 MB of archives.
After this operation, 54.1 MB of additional disk space will be used.
Get:1 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-client-12 amd64 12.6-1.pgdg100+1 [1,424 kB]
Get:2 http://apt.postgresql.org/pub/repos/apt buster-pgdg/main amd64 postgresql-12 amd64 12.6-1.pgdg100+1 [14.7 MB]
Fetched 16.1 MB in 1s (12.8 MB/s)
Selecting previously unselected package postgresql-client-12.
(Reading database ... 43899 files and directories currently installed.)
Preparing to unpack .../postgresql-client-12_12.6-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-client-12 (12.6-1.pgdg100+1) ...
Selecting previously unselected package postgresql-12.
Preparing to unpack .../postgresql-12_12.6-1.pgdg100+1_amd64.deb ...
Unpacking postgresql-12 (12.6-1.pgdg100+1) ...
Setting up postgresql-client-12 (12.6-1.pgdg100+1) ...
update-alternatives: warning: forcing reinstallation of alternative /usr/share/postgresql/13/man/man1/psql.1.gz because link group psql.1.gz is broken
Setting up postgresql-12 (12.6-1.pgdg100+1) ...
Creating new PostgreSQL cluster 12/main ...
/usr/lib/postgresql/12/bin/initdb -D /var/lib/postgresql/12/main --auth-local peer --auth-host md5
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/12/main ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctlcluster 12 main start

Ver Cluster Port Status Owner    Data directory              Log file
12  main    5433 down   postgres /var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log
update-alternatives: warning: forcing reinstallation of alternative /usr/share/postgresql/13/man/man1/postmaster.1.gz because link group postmaster.1.gz is broken
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Processing triggers for postgresql-common (223.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
Stopping PostgreSQL 12 database server: main.
Stopping PostgreSQL 13 database server: main.
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Creating dump of global objects                             ok
Creating dump of database schemas
  discourse

*failure*

Consult the last few lines of "pg_upgrade_dump_16566.log" for
the probable cause of the failure.
Failure, exiting
-------------------------------------------------------------------------------------
UPGRADE OF POSTGRES FAILED

Please visit https://meta.discourse.org/t/postgresql-13-update/172563 for support.

You can run ./launcher start app to restart your app in the meanwhile




FAILED
--------------------
Pups::ExecError: /root/upgrade_postgres failed with return #<Process::Status: pid 46 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "/root/upgrade_postgres"
1b91e47c88940d6c697c346fa8db3d4ab39bbc83f1340dc6f734ca0f9abe6eeb
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

I don’t know why the rebuild failed, except that it appears to have something to do with Postgres. I don’t know where the log file “pg_upgrade_dump_16566.log” is supposed to be.

Ideas?

I think that means that the upgrade didn’t work. I had a similar job yesterday.

What I did, I think, was move the backup data back to postgres_data, then switch to the pg10 (probably 12 for you?) template, then rebuild, and then change the template back and rebuild twice more for the upgrade.

That’s broad strokes, but the best I can do on my phone. The PostgreSQL 13 update should have all you need.

1 Like

What I did, I think, was move the backup data back to postgres_data, then switch to the pg10 (probably 12 for you?) template, then rebuild, and then change the template back and rebuild twice more for the upgrade.

So let me see if I understand. I need to:

  1. Rename postgres_data to something else and then rename the backup I made to postgres_data.
  2. Rename postgres.template.yml to something else and then rename postgres.12.template.yml to postgres.template.yml.
  3. Execute ./launcher rebuild app.
  4. Restore both postgres_data and postgres.template.yml.
  5. Execute ./launcher rebuild app.
  6. Execute ./launcher rebuild app.

Is that right?

Hey Roger. I should be something very much like that, but I don’t know exactly what your situation is, so I can’t promise.

Oh, wait.

No. In the app.yml you’d refer to the postgres 12 template rather than the regular postgres template. You’re editing your app.yml, not renaming any files. That’s explained fairly well in the upgrade howto, I think.

For the site I just fixed, the postgres_data directory was empty and then (I think) my script did a docker prune, which deleted the the container that I think would have worked if I’d just restarted it.

If you just want it fixed and have a budget, please see https://www.literatecomputing.com/automatic-rebuilds-when-they-are-needed/.

1 Like