Running “./launcher rebuild app” fails on db:migrate. Note we are using PostgreSQL v12.
This murdered our forum. The docker container came back up, but the forum didn’t. Luckily I took a VM snapshot before upgrading, restoring it now.
Log:
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
I, [2022-04-14T15:20:51.896917 #1] INFO -- : == 20220304162250 EnableUnaccentExtension: migrating ==========================
-- enable_extension("unaccent")
I, [2022-04-14T15:20:51.897218 #1] INFO -- : Terminating async processes
I, [2022-04-14T15:20:51.897265 #1] INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/12/bin/postmaster -D /etc/postgresql/12/main pid: 1710
I, [2022-04-14T15:20:51.897396 #1] INFO -- : Sending TERM to exec chpst -u redis -U redis /usr/bin/redis-server /etc/redis/redis.conf pid: 1827
2022-04-14 15:20:51.897 UTC [1710] LOG: received fast shutdown request
1827:signal-handler (1649949651) Received SIGTERM scheduling shutdown...
2022-04-14 15:20:51.900 UTC [1710] LOG: aborting any active transactions
2022-04-14 15:20:51.902 UTC [1710] LOG: background worker "logical replication launcher" (PID 1719) exited with exit code 1
2022-04-14 15:20:51.904 UTC [1714] LOG: shutting down
1827:M 14 Apr 2022 15:20:51.913 # User requested shutdown...
1827:M 14 Apr 2022 15:20:51.914 * Saving the final RDB snapshot before exiting.
2022-04-14 15:20:51.965 UTC [1710] LOG: database system is shut down
1827:M 14 Apr 2022 15:20:53.157 * DB saved on disk
1827:M 14 Apr 2022 15:20:53.157 # Redis is now ready to exit, bye bye...
FAILED
--------------------
Pups::ExecError: cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate' failed with return #<Process::Status: pid 2118 exit 1>
Location of failure: /usr/local/lib/ruby/gems/2.7.0/gems/pups-1.1.1/lib/pups/exec_command.rb:117:in `spawn'
exec failed with the params {"cd"=>"$home", "hook"=>"db_migrate", "cmd"=>["su discourse -c 'bundle exec rake db:migrate'"]}
bootstrap failed with exit code 1
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
2dcd9aeca614c9e06ef748f673eb68203db6eae5c445253b416d666663879d6d
==================== END REBUILD LOG ====================
Failed to rebuild app.
They aren’t separate and no external PG. The PG13 upgrade also fails (non-destructively, unlike today’s upgrade) and I didn’t get support on how to fix it in here, to be honest.
I believe (can’t check obv) we only had one container showing up in docker ps. The standard install is 2 containers now?
This extension became available as a “trusted” extension on PostgreSQL 13+, where it can be enabled by any users.
Since you are running an older PostgreSQL version, you will have to workaround this, by installing and enabling this extension for the Discourse user, and potentially trying to trick Discourse in considering this extension as already installed. That or moving to the current supported version of PostgreSQL.
OK. So in summary, you are no longer supporting PG12. Suggest posting that to the PG13 upgrade thread somewhere, and probably in the 2.9.0b4 annoucements too.
So my restore finally finished, after over 4 hours. And Discourse was shooting 502 errors. This snapshot was taken before the upgrade, so this is super bizarre.
Anyway looking at the nginx logs I found this error
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.10.3/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:30:in `require': cannot load such file -- /var/www/discourse/lib/freedom_patches/schema_cache_concurrency.rb (LoadError)
Sure enough, this file was owned by root:root and chmod 0000. Changing it to discourse:root and 644 to match the other files in that directory got us back up and running. Phew!
Any idea how that file got nuked/changed? It’s 0 bytes too, super weird.
root@forum-app:/shared/log/rails# ls -la /var/www/discourse/lib/freedom_patches/schema_cache_concurrency.rb
-rw-r--r-- 1 discourse root 0 Feb 10 17:41 /var/www/discourse/lib/freedom_patches/schema_cache_concurrency.rb
Tried to do the “Restore to a Fresh Install”, but it doesn’t work, PG doesn’t cooperate. Tried to downgrade the discourse version and go back to PG12, but then it becomes a carnival of light with all the plugins.
Yes, I did change the template, as part of the “ok this is not working” strategy so that I could go back to PG12 (even though this makes me question how am I going to upgrade PG then ).
I tried more recent ones but the whole enable_extension("unaccent") error is still present, which implies that on those commits the change to depend on it was already done.
Waiting on the result of this try.
Update: Nope, failed on the restore during the “unpacking dump” phase and now is dead again.
Hi, If you have a backup of discourse, I’d suggest doing this on a different server to try it out first.
I believe that you experienced this issue while upgrading an instance of discourse that was running an older version.
So, Try installing a copy of discourse by manually editing the yml to use discourse “stable” and pin postgres version to 12.
If the build succeeds, try restoring the backup. Hopefully, it will restore successfully.
If it succeeds, change postgres 12 template back to the default postgres template and comment out the stable tag so that discourse rebuilds with latest tests-passed.
I believe that if the backup is salvageable, it should then be able to survive the postgres and discourse upgrades.
I’m pretty much in “limbo” right now. Tried your suggestion with PG12 and “Stable”. It doesn’t work, the restore just stops. So I’m wiping the machine once again to try again fresh because now it won’t do app rebuild.
While that machine is trying to “go back to PG12” I’m trying to see if I can move forward on the other: If I try to upgrade PG with a fresh install it dies after Creating missing functions in the discourse_functions schema... (starts returning 500) and tail -f shared/data/log/var-log/postgres/current shows that the data container is “moving”, although just full of “errors” such as this:
discourse@discourse ERROR: relation "user_auth_tokens" does not exist at character 34
discourse@discourse STATEMENT: SELECT "user_auth_tokens".* FROM "user_auth_tokens" WHERE ((auth_token = 'XXXX=' OR
prev_auth_token = 'XXXX=') AND rotated_at > '2022-03-09 10:21:44.051357') LIMIT 1
discourse@discourse ERROR: relation "application_requests" does not exist at character 41
discourse@discourse STATEMENT: SELECT "application_requests"."id" FROM "application_requests" WHERE "application_requests"."date" = '2022-05-08' AND "application_requests"."req_type" = 0 LIMIT 1
However, Discourse may be dead but the machine Is being used, so… is it working but it just takes hours and hours? Because I let it roll for 1h+ and nothing changed.
At this point I’m already thinking if this shouldn’t go here, lol.
PD: I tried 2 different backups, since I took 2 before the adventure.
That’s not good-- so you couldn’t restore your backup from the old version of Discourse with PG12 to a fresh new one with PG13? Can you post the error messages etc? Everybody here repeatedly assured me that would work.