Upgrade Issues: Failed Upgrade due to Duplicate Key, Failed Snapshot Restore

I seem to have run into an issue when running the latest upgrade where it fails due to this:

PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "index_users_on_username_lower"
DETAIL:  Key (username_lower)=(scottie) already exists.

I tried to recover by attempting to delete this user by accessing either Rails console or the Postgres database directly, and I keep getting an error that the server isn’t running.

The following is me attempting to run rails after starting the app:

FAILED

--------------------

Pups::ExecError: cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate' failed with return #<Process::Status: pid 3743 exit 1>

Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'

exec failed with the params {"cd"=>"$home", "hook"=>"db_migrate", "cmd"=>["su discourse -c 'bundle exec rake db:migrate'"]}

c744729efb7a5813699e2fe8fa46ab551352d45ada8dcb9204121508364f8438

** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.

./discourse-doctor may help diagnose the problem.

root@kForum:/var/discourse# ./launcher start app

starting up existing container

+ /usr/bin/docker start app

app

root@kForum:/var/discourse# ./launcher enter app

root@kForum-app:/var/www/discourse# rails c

bundler: failed to load command: pry (/var/www/discourse/vendor/bundle/ruby/2.7.0/bin/pry)

PG::ConnectionBad: could not connect to server: No such file or directory

Is the server running locally and accepting

connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

The following is me attempting to run postgres:

root@kForum:/var/discourse# ./launcher enter app

root@kForum-app:/var/www/discourse# su discourse

discourse@kForum-app:/var/www/discourse$ psql discourse

psql: error: could not connect to server: No such file or directory

Is the server running locally and accepting

connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

discourse@kForum-app:/var/www/discourse$

I am restoring an earlier snapshot in DigitalOcean to get back to a working state, but the strange (scary?) part is that even restoring the snapshot didn’t seem to get the site back and up and running. This snapshot was taken over 24 hours ago when I know the forums were working fine, so it should be unrelated to the upgrade I attempted.

In my restored snapshot, this is the error that I see when I try to stop/start/restart the app:

root@kForum:/var/discourse# ./launcher restart app

+ /usr/bin/docker stop -t 10 app

app

starting up existing container

+ /usr/bin/docker start app

Error response from daemon: container "a1023d05a7b4de25ded1aa69ad49caed9fa59d15fa8e8130d32db82934139e6b": already exists

Error: failed to start containers: app

root@kForum:/var/discourse#

Lastly, I ran discourse-doctor, and this seems like a relevant part of the failure?

Checking cluster versions                                   ok

The source cluster was not shut down cleanly.
Failure, exiting
-------------------------------------------------------------------------------------
UPGRADE OF POSTGRES FAILED

Please visit https://meta.discourse.org/t/postgresql-13-update/172563 for support.

You can run ./launcher start app to restart your app in the meanwhile




FAILED
--------------------
Pups::ExecError: /root/upgrade_postgres failed with return #<Process::Status: pid 46 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "/root/upgrade_postgres"
c98d1b076faed9b0c728cd944b3f2436afc9e266910b82a644c71fd80b42f073
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
==================== END REBUILD LOG ====================
Failed to rebuild app.

Checking your domain name . . .

Connection to forum.kirupa.com succeeded.
You should probably remove any non-standard plugins and rebuild.
Attempting to restart existing container. . . 

starting up existing container
+ /usr/bin/docker start app
Error response from daemon: driver failed programming external connectivity on endpoint app (78a35e0f12e5af560bd1f991ead900fefd0c81570cc4582fe99aaa5e56b238e8): Error starting userland proxy: listen tcp 0.0.0.0:443: bind: address already in use
Error: failed to start containers: app
Failed to restart the container.


==================== PLUGINS ====================
          - git clone https://github.com/discourse/docker_manager.git
          - git clone https://github.com/discourse/discourse-solved.git
          - git clone https://github.com/discourse/discourse-sitemap.git

No non-official plugins detected.

See https://github.com/discourse/discourse/blob/master/lib/plugin/metadata.rb for the official list.

========================================
Discourse version at forum.kirupa.com: NOT FOUND
Discourse version at localhost: NOT FOUND

Thoughts on what is going on here? Getting the forums back is my first goal. Fixing the upgrade failures is my secondary goal.

Thanks,
Kirupa

You should look here - this seems to be the same issue: Help! Upgrade led to complete fail

I had what u think was a similar case yesterday (see Help! Upgrade led to complete fail - #8 by Judith). In broad strokes, I think you need to do something like

  • rename the postgres_data _old back to postgres_data
  • restart he old container
  • fix the corrupt index problem by renaming the duplicate users (or perhaps users that are duplicate because they have different capitalization
  • build the indexes on the user table to make sure they are all fixed
  • change to the postgres 12 template as described in PostgreSQL 13 update
  • rebuild to upgrade
  • when that works, you can change the template back and finish the upgrade.
  • note that you need to update not only the username but also the username_lower

The one that I fixed yesterday had a duplicate username as old as February 2020, so you likely don’t want to try to fix this by restoring an older database.

To fix it you need to understand how to modify records with the postgres command line , rebuild databases, and deal with complex, but well documented, issues in the postgres 13 upgrade.

If you have a $500 budget you can contact me. My contact info is in my profile.

@pfaffman - just pinged you via your contact form :slight_smile:

I replied. If you didn’t get it then maybe you put the wrong address?

1 Like

To close the loop on this, @pfaffman is a wizard. He solved the problem with the forum upgrade (and a host of related issues) very quickly :slight_smile:

3 Likes

I am having the same issue, but not going to pay $500 for a solution. I have SFTP access to the server, so would it not be possible to just open the database in a notepad and rename the conflicted username?

Or if anyone can fix it for like $50, please contact me.

You can check out PG::UniqueViolation Problem during 3.1.0.beta4 upgrade - #3 by ahmedeldeep where someone described pretty well what they did.

You’ll need an SSH client, not SFTP. If you’re using Digital Ocean you can use the console on their web site.