Strange Postgres DB Issue with New Install

Here is one I cannot figure out after many flawless Discourse installs and migrations.

Background:

We had Discourse running well in a Docker container and were working on some postgres DB nuances.

The issue happened when we were not happy with the cooked results of rebaking raw posts, and so, nothing was working as planned, so we decided to drop the postgres DB and recreate it; but the app kept giving various permissions errors, etc.

Then, we decided “lets go hard core” and decided to clobber it in “what the heck” style; and we just went in to postgres (knowing this would not work out well, but wanting to try) and deleted all the topics and posts from the DB (DELETE FROM topics; DELETE FROM posts;). This kinda worked; but we were not happy with the results (experiment over), so we decided to rebuild discourse from scratch, moving the old /var/discourse out of the way, and pulling from git for a fresh start.

The Problem

When we build totally new from a git pull, it worked fine (the build) up until the part where we go to the site to create the admin login.

When we went to the admin login for a new install, it was the old site we destroyed! This was a surprise.

So we decided to go into this new app and try dropping all discourse tables from the DB, which we did; but, surprise, when we rebuilt the app again, it is not a fresh site, but the same broken site from above.

So, we deleted all /var/*discourse* directories; and removed all docker images. thinking this would be totally clean and started anew pulling from git into /var/discourse again and building from what we thought was total scratch; but surprise… the old site is still there.

Thinking, “how could this be”…??

We did a ps aux | grep postgres outside the docker container and noticed postgres existed outside the container (which as a surprise, as we mistakenly thought the discourse docker install was all in the docker container); and so then tried to find where to clean up, but no joy.

We searched until Google links have turned purple, and tried so much… but we cannot get a clean install of discourse.

Thinking we were missing something, we got on a new, never installed discourse server, and installed discourse from scratch, and it worked flawlessly as usual (another server).

The Question

My question is, I guess… when an install has totally gone off the rails (by hook or by crook), how do we get the server, including postgres back to ground zero so this problem will go away and we can get a completely fresh new install going?

Sorry for such a long post, when The Question might have been enough to get help.

Thanks.

Instead of removing or emptying tables, just drop the database.

3 Likes

Thanks. Will try that and post back the results.

Tried to drop the DB, but I keep getting this permissions error:

/var/www/discourse# su postgres -c 'psql'
psql (10.12 (Debian 10.12-1.pgdg100+1))
Type "help" for help.
postgres=# drop database discourse;
ERROR: database "discourse" is being accessed by other users
DETAIL: There are 3 other sessions using the database.

Any clues?

My best guess is that Your didn’t delete the running docker container, but you claim that you deleted the images. And it seems like you’d have gotten some other indication.

Or you’re using an external postgres rather than the one in the container?

Usually deleting /var/discourse/shared and doing a rebuild does the trick.

1 Like

Thanks.

We just killed all the prior discourse DB sessions and that let us drop database discourse.

Now doing the ./launcher rebuild app dance again. Will post back with the results.

3 Likes

./launch rebuild app did not work; so here is what we did next:

Then:

Building app

WARNING: We are about to start downloading the Discourse base image
This process may take anywhere between a few minutes to an hour, depending on your network speed

Please be patient

```Unable to find image 'discourse/base:2.0.20200220-2221' locally
2.0.20200220-2221: Pulling from discourse/base
bc51dd8edc1b: Pulling fs layer
27ae5d171719: Pulling fs layer
bc51dd8edc1b: Download complete
bc51dd8edc1b: Pull complete
27ae5d171719: Verifying Checksum
27ae5d171719: Download complete
27ae5d171719: Pull complete
blah blah....
blah blah....
blah blah....

Still not working after errorless rebuild and launch.

So, tried again, turning off LETSENCRYPT option:

  • Optional email address for Let’s Encrypt warnings? (Enter ‘OFF’ to disable.) : OFF

and it is still building the prior destroyed (from hours ago) instance because in that instance, we installed a number of theme, and they are still here in this build even after we dropped the discourse DB:

Start compiling CSS: 2020-03-15 10:16:20 UTC
Compiling css for default 2020-03-15 10:16:20 UTC
precompile target: desktop Dark
precompile target: mobile Dark
precompile target: desktop_rtl Dark
precompile target: mobile_rtl Dark
precompile target: desktop_theme Dark
precompile target: mobile_theme Dark
precompile target: admin Dark
precompile target: desktop Light
precompile target: mobile Light
precompile target: desktop_rtl Light
precompile target: mobile_rtl Light
precompile target: desktop_theme Light
precompile target: mobile_theme Light
precompile target: admin Light
precompile target: desktop 
precompile target: mobile 
precompile target: desktop_rtl 
precompile target: mobile_rtl 
precompile target: desktop_theme 
precompile target: mobile_theme 
precompile target: admin 
Done compiling CSS: 2020-03-15 10:16:27 UTC

How can it be possible that after dropping the entire discourse DB, purging all docker images and containers, deleting rm -rf /var/discourse and rebuilding from ground zero; we still see all the installed theme from the the many hours old build we are trying to destroy competely?

It makes no sense in a fresh install.

Well, we started over again and commented out the LETSENCRYPT templates and email option, and got it to rebuild correctly and got the celebration admin login page.

Progress!

Now will edit app.yml and try to get SSL going again.

Well. That’s interesting…

If I rebuild the app with SSL (LETS ENCRYPT) enabled, I get two different sites…

  • HTTP: New site as expected
  • HTTPS: Old broken site

Hmmmm. This is really perplexing!

What does

 docker ps

Show?

Before each build we purged all old docker images, etc, as follows:

docker system prune -a

So, it’s not a docker image issue.

We believe the problem is related to the LETSENCRYPT SSL cert; because when we changed the sub-domain and generated a new SSL cert in the build process on the same server IP, it works as it should; but when we go back to the original sub-domain, the problem remains.

Hence, we have dropped using the troublesome sub-domain for now (was only a staging domain anyway); and moved on.

Thanks for the ideas.

Stay safe.

But that deletes only unused images. Are you sure that there are no running containers?

1 Like

Sounds like you have more than one container - is there more than an app.yml in your containers folder?

3 Likes

docker ps show only one container running and there is only one app.yml file in /var/discourse/containers

Thanks for all the good ideas , though!

Much aporeciated.

1 Like