Postgres problem... again!

Wanderer · October 29, 2021, 5:26pm

Hi, I’ve two big problem when stopping and starting again Discourse.

First
After 2/3 times I do it (no matter if with ./launcher stop or via Portainer stop), container refuses to start again, continuosly stuck on “rm: cannot remove ‘/shared/nginx.http.sock’: Is a directory”. (I’ve just noticed the socket is not in “shared” dir, but in “shared/standalone”, is it just a message mistake?)
For some reason still unknown to me, this socket, maybe after stop, become a directory, and the template can’t delete it because it doesn’t manage to delete a directory, but a socket. Deleting it manually change nothing, it reappears everytime.

Second
“./launcher rebuild app” it stucks on “FATAL: the database system is starting up”, after the first “PANIC could not locate a valid checkpoint record”. I’ve read and tried everything I found regarding this problem, and the only working “solution” I found was deleting discourse dir with evertything inside and setup again all the stuff… obviously not a real solution!

It seems that sometimes stopping Discourse container leaves the database in a bad way, and it can’t continue because “is starting up”, maybe trying to fix something. But I still haven’t found how to fix this problem, that seems to raise up after some stop&start.

Any clue? Some way to leave Postgres fix its problems?

Falco · October 29, 2021, 6:06pm

No it’s not a mistake. This file is in the volume that gets mounted in the container, so it has different paths depending on the point of view, aka inside the container vs outside the container.

This happens when the container is not stopped properly. PostgreSQL needs some time to properly shutdown, and we do that when you stop with our launcher command. However, if your instance if of a big enough size, or there are too many running transactions the database may struggle to stop properly in the deadline it receives.

Wanderer · October 29, 2021, 7:24pm

But I find this “nginx.http.sock” created on “shared/standalone” in the host file system, too. Initially pink, as a socket, but after a while, maybe a stop, it becomes blue, as a directory, and container refuses to start stuck on trying deleting a socket… become a directory.

So, what can we do in order to fix it? Until now, I’m just experimenting, but should I go online with some hundred people connected and hundred thousands post, will I risk to loose everything just because Postgres doesn’t manage abruptly interruption? It will be something to do in order to fix a damaged database. Can Discourse run on external Postgres, something that we can manage if Discourse container doesn’t start? In short, in case of “PANIC” or “FATAL”, there should be some fix…

Wanderer · October 30, 2021, 12:49pm

Actually trying to solve the problem (for future) by setting up a “containerized” Postgres and attaching it to Discourse, in the hope not to damage db (or at least be able to do some maintenance even with Discourse down) when stopping Discourse.

Topic		Replies	Views
Trying to recover an installation Installation	8	1115	May 18, 2023
Database system was not properly shut down error when rebuilding Installation	4	6885	September 19, 2019
Difficulty installing on vm that includes CPanel and Apache Installation unsupported-install	7	72	March 20, 2025
Postgres Errors on Rebuild Installation	3	1569	September 16, 2015
[Solved]Bootstrap fails, Postgres already running Installation	21	10928	April 3, 2015

Postgres problem... again!

Related topics