Problem rebuilding because of slow database shutdown

Wingtip · April 19, 2023, 6:36pm

This recommended upgrade failed and didn’t get my forum back up after breaking. I’m running discourse-doctor now to try to fix it, and if that fails too, I took a VM snapshot.

Output:

2023-04-19 18:28:31.298 UTC [42] LOG:  received fast shutdown request
2023-04-19 18:28:33.651 UTC [65] LOG:  shutting down
2023-04-19 18:28:33.974 UTC [42] LOG:  database system is shut down


FAILED
--------------------
Pups::ExecError: su postgres -c 'psql discourse -c "alter schema public owner to discourse;"' failed with return #<Process::Status: pid 59 exit 2>
Location of failure: /usr/local/lib/ruby/gems/3.2.0/gems/pups-1.1.1/lib/pups/exec_command.rb:117:in `spawn'
exec failed with the params "su postgres -c 'psql $db_name -c \"alter schema public owner to $db_user;\"'"
bootstrap failed with exit code 2
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
c13e1ba313de8fc84f6e2fb0f88197a908803c39791283effb8c82f55b56b6dc
Command exited with non-zero status 1
1.85user 1.84system 3:21.56elapsed 1%CPU (0avgtext+0avgdata 36996maxresident)k
197608inputs+368outputs (1133major+96509minor)pagefaults 0swaps

pfaffman · April 19, 2023, 6:38pm

Are you on the beta branch?

You can try to restart your container iwth

 ./launcher start app

but that’s what discourse-doctor should do.

You’ll need to give more of the output as the error is above what you included.

Wingtip · April 19, 2023, 6:40pm

Yes we are on the beta branch. I always run inside nohup, so I have the full log.

Discourse-doctor is still grinding away, but it hasn’t failed yet so I have hope.

https://pastebin.mozilla.org/iw2zc5zd

Edit: Discourse-doctor got us back up and running.

I basically asked for this, upgrading an hour after that notification and being the first one to do so. No real stress with that snapshot beforehand, so I took one for the team here fellas.

Falco · April 19, 2023, 7:12pm

2023-04-19 18:28:26.755 UTC [45] LOG: database system was not properly shut down; automatic recovery in progress

If your database can’t stop safely in 60s, which will happen with large DBs with slower disks, it will enter this state and fail a rebuild if it can’t recover in 5s (which is rare since it’s large/slow).

This has nothing to do with the changes listed here, and is a problem in Discourse since at least 2016.

Wingtip · April 19, 2023, 8:09pm

Ahh, thanks. Maybe it should wait longer for larger forums like ours. If you just kill the DB process it’ll need to rollback transactions after being started back up and that can take a very long time.

The terminology re beta is somewhat confusing. The admin dash says we’re running beta, is there somewhere else we should have looked? My understanding is beta is recommended for discourse based on the release announcements discouraging using the stable branch.

Stephen · April 19, 2023, 10:13pm

The default is actually tests_passed, which is considered production-ready.

pfaffman · April 20, 2023, 1:18am

How big is your database? Is it on ssd? How much ram do you have?

Having a separate data container would require fewer database restarts.

Ed_S · April 20, 2023, 5:18am

When was 60s decided on for a safe shutdown? How many installations are now much bigger than was then normal?

Ideally this 60s wait should be more of a closed-loop wait, with a limit. It sounds like the limit should be higher, if there are now many instances out there which are now vulnerable.

Wingtip · April 20, 2023, 1:19pm

It’s 105GB, on SSD, 16GB VM, and I gave postgres an 8GB buffer pool.

pfaffman · April 20, 2023, 2:08pm

I think I saw that it was at least as long ago as 2016. But things have changed. EDIT: Here’s a new commit.

I don’t think that many on a standard install, as it’s been this way almost since the beginning.

Uh, yeah. That’s a big database. I suspect that few people have a database that big that’s not on RDS or at least a seperate container. You should probably consider switching to a 2-container install.

Wingtip · April 20, 2023, 2:19pm

We’ll consider it, is the switching method documented? And are there any other advantages that increasing the 60s timer wouldn’t provide?

Falco · April 20, 2023, 2:36pm

I increased it to 10 minutes yesterday

Wingtip · April 20, 2023, 2:37pm

Oh great, I assumed he was posting the original commit back in 2016. So any advantages for us at all?

pfaffman · April 20, 2023, 2:38pm

You can check out Move from standalone container to separate web and data containers

You can build a new container while the old one continues to run. You don’t need to shut down the database to build a new container.

There is now a 10 minute window for shutting down postgres, which should solve your current problem. Once you do one more rebuild, you’ll have the 10 minutes instead of one.

Wingtip · April 20, 2023, 2:40pm

Oh that guy just built a completely new two container instance then restored from backup. We definitely aren’t doing that without a good reason, I just had to do it to avoid the PG13 upgrade disk space requirements like 2 months ago.

pfaffman · April 20, 2023, 2:41pm

If you’re not on PG13 then you should fix that.

I’d spin up a new server and move to it.

Wingtip · April 20, 2023, 2:42pm

We are now, that one was ultimately unavoidable! Beyond the DB, we also needed to upgrade from the desupported 18.04LTS.

sam · April 21, 2023, 10:04pm

With a db this size you should move it to a dedicated container

It will speed up rebuilds a ton and simplify everything for you

Wingtip · April 25, 2023, 2:58pm

If there’s documentation on how to do that without a complete rebuild from scratch then restoring backups we’ll definitely consider it.

pfaffman · April 25, 2023, 2:59pm

So you want to Migrate quickly to separate web and data containers

Topic		Replies	Views
Upgrade failed due to unclean database shutdown Installation	21	4195	November 9, 2017
Discourse update doesn't wait for Postgress DB to shut down Installation	5	875	January 28, 2022
Database system was not properly shut down error when rebuilding Installation	4	6887	September 19, 2019
2.6.0 Beta4 update failed with what looks like db errors Installation	7	687	November 18, 2020
Bootstrap failed, please help :( Installation	6	935	January 3, 2023

Problem rebuilding because of slow database shutdown

Related topics