Stuck in an update loop after PostgreSQL 13 update

I have read the docs here and they don’t cover my situation: PostgreSQL 13 update

I’m stuck on the “Every rebuild does the upgrade again aka upgrade loop” situation:

-------------------------------------------------------------------------------------
UPGRADE OF POSTGRES COMPLETE

Old 10 database is stored at /shared/postgres_data_old

To complete the upgrade, rebuild again using:

./launcher rebuild app
-------------------------------------------------------------------------------------

The docs say that this because there are still files from the last upgrade lingering around. Move those elsewhere before continuing.

But I don’t have any old files:

root@connect:/var/discourse# ls /mnt/volume_ams3_01/shared/standalone/
backups  letsencrypt  log  postgres_backup  postgres_data  postgres_data_new  postgres_run  redis_data  ssl  state  tmp  uploads

Two more notes:

  • The postgres_data folder is empty.
  • I have a separated shared folder using DigitalOceans Spaces.

What can I try to resolve this?

Hi Alex,

I’m not sure what’s causing the reboot loop, but you may be able to workaround it. Does the postgres_data_new directory contain your database? If so, check the PG_VERSION file within that directory to see if the upgrade did work. Also, the complete logs would be useful if you could copy those in.

2 Likes

Yes, the postgres_data_new contains my database and the PG_VERSION within it confirms that the version is 13.

I’m not sure what logs would be helpful to share with you (and where to find them).

In that case, you should be able to copy the postgres_data_new into the postgres_data directory, and do a rebuild. launcher will see that the database is already up to date on PG13, and continue from there.

1 Like

Hi Michael! I forgot to mention that since posting here originally I did try that (twice). I’m still stuck in the loop.

1 Like

I’m not sure how to capture the output of ./launcher rebuild app, but here’s what I’ve got.

It starts like this:

root@connect:/var/discourse# ./launcher rebuild app
Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
Stopping old container
+ /usr/bin/docker stop -t 60 app
app
cd /pups && git pull && git checkout v1.0.3 && /pups/bin/pups --stdin
From https://github.com/discourse/pups
   17f04ec..e0ff889  master     -> origin/master
 * [new tag]         v1.1.1     -> v1.1.1
 * [new tag]         v1.1.0     -> v1.1.0
Updating 17f04ec..e0ff889
Fast-forward
 .github/workflows/ci.yml     |  29 ++++++
 .github/workflows/lint.yml   |  27 +++++
 .rubocop.yml                 |   3 +
 Gemfile                      |   2 +
 Guardfile                    |   4 +-
 README.md                    |  21 ++++
 Rakefile                     |  14 +--
 bin/pups                     |   8 +-
 lib/pups.rb                  |  32 ++++--
 lib/pups/cli.rb              |  92 ++++++++++-------
 lib/pups/command.rb          |  25 +++--
 lib/pups/config.rb           | 240 +++++++++++++++++++++++--------------------
 lib/pups/docker.rb           |  69 +++++++++++++
 lib/pups/exec_command.rb     | 182 ++++++++++++++++----------------
 lib/pups/file_command.rb     |  60 +++++------
 lib/pups/merge_command.rb    |  94 ++++++++---------
 lib/pups/replace_command.rb  |  70 +++++++------
 lib/pups/runit.rb            |  47 +++++----
 lib/pups/version.rb          |   4 +-
 pups.gemspec                 |  37 ++++---
 test/cli_test.rb             | 102 +++++++++++++++---
 test/config_test.rb          | 215 ++++++++++++++++++++++++++++----------
 test/docker_test.rb          | 157 ++++++++++++++++++++++++++++
 test/exec_command_test.rb    |  62 ++++++-----
 test/file_command_test.rb    |  17 ++-
 test/merge_command_test.rb   |  64 ++++++------
 test/replace_command_test.rb |  86 ++++++++--------
 test/test_helper.rb          |   2 +
 28 files changed, 1158 insertions(+), 607 deletions(-)
 create mode 100644 .github/workflows/ci.yml
 create mode 100644 .github/workflows/lint.yml
 create mode 100644 .rubocop.yml
 create mode 100644 lib/pups/docker.rb
 create mode 100644 test/docker_test.rb
Note: checking out 'v1.0.3'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at d1db030 cut a new version
I, [2021-10-19T05:37:44.995716 #1]  INFO -- : Loading --stdin
I, [2021-10-19T05:37:45.001857 #1]  INFO -- : > locale-gen $LANG && update-locale
I, [2021-10-19T05:37:45.031533 #1]  INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2021-10-19T05:37:45.032260 #1]  INFO -- : > mkdir -p /shared/postgres_run
I, [2021-10-19T05:37:45.037403 #1]  INFO -- :
I, [2021-10-19T05:37:45.038002 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run
I, [2021-10-19T05:37:45.041480 #1]  INFO -- :
I, [2021-10-19T05:37:45.041974 #1]  INFO -- : > chmod 775 /shared/postgres_run
I, [2021-10-19T05:37:45.044313 #1]  INFO -- :
I, [2021-10-19T05:37:45.044759 #1]  INFO -- : > rm -fr /var/run/postgresql
I, [2021-10-19T05:37:45.047047 #1]  INFO -- :
I, [2021-10-19T05:37:45.047605 #1]  INFO -- : > ln -s /shared/postgres_run /var/run/postgresql
I, [2021-10-19T05:37:45.051062 #1]  INFO -- :
I, [2021-10-19T05:37:45.051463 #1]  INFO -- : > socat /dev/null UNIX-CONNECT:/shared/postgres_run/.s.PGSQL.5432 || exit 0 && echo postgres already running stop container ; exit 1
2021/10/19 05:37:45 socat[33] E connect(6, AF=1 "/shared/postgres_run/.s.PGSQL.5432", 36): No such file or directory
I, [2021-10-19T05:37:45.058669 #1]  INFO -- :
I, [2021-10-19T05:37:45.058976 #1]  INFO -- : > rm -fr /shared/postgres_run/.s*
I, [2021-10-19T05:37:45.061427 #1]  INFO -- :
I, [2021-10-19T05:37:45.061743 #1]  INFO -- : > rm -fr /shared/postgres_run/*.pid
I, [2021-10-19T05:37:45.063969 #1]  INFO -- :
I, [2021-10-19T05:37:45.064258 #1]  INFO -- : > mkdir -p /shared/postgres_run/13-main.pg_stat_tmp
I, [2021-10-19T05:37:45.068148 #1]  INFO -- :
I, [2021-10-19T05:37:45.068570 #1]  INFO -- : > chown postgres:postgres /shared/postgres_run/13-main.pg_stat_tmp
I, [2021-10-19T05:37:45.070400 #1]  INFO -- :
I, [2021-10-19T05:37:45.074243 #1]  INFO -- : File > /etc/service/postgres/run  chmod: +x  chown:
I, [2021-10-19T05:37:45.077577 #1]  INFO -- : File > /etc/service/postgres/log/run  chmod: +x  chown:
I, [2021-10-19T05:37:45.081084 #1]  INFO -- : File > /etc/runit/3.d/99-postgres  chmod: +x  chown:
I, [2021-10-19T05:37:45.084463 #1]  INFO -- : File > /root/upgrade_postgres  chmod: +x  chown:
I, [2021-10-19T05:37:45.084841 #1]  INFO -- : > chown -R root /var/lib/postgresql/13/main
I, [2021-10-19T05:37:45.766251 #1]  INFO -- :
I, [2021-10-19T05:37:45.766560 #1]  INFO -- : > [ ! -e /shared/postgres_data ] && install -d -m 0755 -o postgres -g postgres /shared/postgres_data && sudo -E -u postgres /usr/lib/postgresql/13/bin/initdb -D /shared/postgres_data || exit 0
I, [2021-10-19T05:37:45.769955 #1]  INFO -- :
I, [2021-10-19T05:37:45.770597 #1]  INFO -- : > chown -R postgres:postgres /shared/postgres_data
I, [2021-10-19T05:37:45.841916 #1]  INFO -- :
I, [2021-10-19T05:37:45.842605 #1]  INFO -- : > chown -R postgres:postgres /var/run/postgresql
I, [2021-10-19T05:37:45.845109 #1]  INFO -- :
I, [2021-10-19T05:37:45.845574 #1]  INFO -- : > /root/upgrade_postgres
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
debconf: delaying package configuration, since apt-utils is not installed

Then it goes by too fast (with many lines in /shared/postgres_data/base/whatever and ends like this:

  /shared/postgres_data/base/16400/203028
  /shared/postgres_data/base/16400/203045
  /shared/postgres_data/base/16400/203047
  /shared/postgres_data/base/16400/203049
  /shared/postgres_data/base/16400/203050
  /shared/postgres_data/base/13014/2613
  /shared/postgres_data/base/13014/2683
  /shared/postgres_data/base/1/2613
  /shared/postgres_data/base/1/2683
                                                            ok
Setting next OID for new cluster                            ok
Sync data directory to disk                                 ok
Creating script to analyze new cluster                      ok
Creating script to delete old cluster                       ok

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh
-------------------------------------------------------------------------------------
UPGRADE OF POSTGRES COMPLETE

Old 10 database is stored at /shared/postgres_data_old

To complete the upgrade, rebuild again using:

./launcher rebuild app
-------------------------------------------------------------------------------------

a68ed0b1b54e4a0e2dae2543dc27d87be02ca1f81738e0d2e43511a46524a980

Would you mind sharing what templates you’re including in containers/app.yml?

Sure:

templates:
  # - "templates/postgres.10.template.yml"
  - "templates/postgres.13.template.yml"
  - "templates/redis.template.yml"
  - "templates/web.template.yml"
  - "templates/web.ssl.template.yml"
  - "templates/web.letsencrypt.ssl.template.yml"
  - "templates/web.ratelimited.template.yml"
  - "templates/web.replygif.template.yml" # for the ReplyGIF plugin: https://github.com/cpradio/discourse-plugin-replygif

(I’m going to comment out the replygif one, since I’m not using it anyway.)

What does this mean? I think this might be related.

Why isn’t this just the postgres template with no 13?

This might be the issue.

1 Like

Hi Jay!

You’re, right, that’s a mistake and seems like a likely culprit. I changed it to templates/postgres.template.yml, and rebuilt twice, but I’m still stuck in the loop.

I followed this guide (back when it was available) to use DigitalOcean Spaces for backup and image uploads.

My postgres data seems to be in two places:

  • /var/postgres_data_discourse (PG_VERSION = 10)
  • /mnt/volume_ams3_01/shared/standalone/postgres_data_new (PG_VERSION = 13)
  • /mnt/volume_ams3_01/shared/standalone/postgres_data (PG_VERSION = 13) – I copied these files manually from postgres_data_new as suggested earlier

My current wild guess is that this has something to do with your volume mapping.
.

But the above looks like you have a postgres that is separate from the postgres that Discourse is updating. Discourse doesn’t have anything to do with the postgres at /var/postgres_data_discourse. Maybe you tried to use your own postgres rather than the one that Discourse provides? That doesn’t look like a standard install.

1 Like

This is a standard install from 2016 and have been updating it since, so perhaps something has changed in the meantime?

For the volume mapping, I followed the steps in the guide I linked to above.

I don’t believe I’ve tried to install my own postgres – I’m not technical enough to know why I’d event want to do that :smiley:

I can say that I had the “loop problem” when I tried to update from postgres 10 to 12, and so put it aside to work on later. The other day I stopped being able to rebuild at all, and from the error messages I figured it had to do with postgres and so tried to updated thinking it might solve the issue.

Is it possible to revert to the postgres 10 template to see what errors it’s throwing up?

Edit: any other suggestion is welcome! I’m not sure how to proceed at this point.

It might have been good to have mentioned that in your first post.

I would do a clean installation and restore your most recent backup. That’s the easiest way forward. Your OS is likely out of date as well.

2 Likes

That seems like a lot of trouble to resolve one issue, but at least it’s a concrete path forward. Thank you for helping me work through this, Jay! :pray:

I’m not saying that there’s not another way out of your predicament, but this sledgehammer approach has a high likelihood of success and requires no special knowledge. There are a bunch of tiny things that I might try if I were the one fixing it, but they are all rather difficult to describe here, especially when it’s unclear which one(s) to try or whether they might work.

If this system was set up in 2016 and still has that OS, then it’s not too soon to get an OS upgrade either, and IMHO it’s much easier to spin up a new server than to do an OS upgrade (that may be much less true now than it was 15 years ago when I formed that opinion!).

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.