Help me debug failed migrations PushFixTopicEmbedAuthorsJob

Okay, that makes sense :+1:.

As long as those added plugins are not enabled by default, at least not for existing instances, all good. But regarding the gem dependency changes, and the multiple database migration failures we faced, I am somewhat worried that something might be invisibly broken or lost. Sadly the rebuild does zero log in these regards, just saying that cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate' exited with error code, but no details, also not in logs :thinking:.

this sounds odd, are you running PG in a separate container, does it have pgvector installed?

Generally when stuff fails, you can scroll up a bit and find the error, due to sequencing often the errors is 100 lines up.

Just the standalone Discourse Docker container. Also I was not fully correct, indeed there is some more output about it, but nothing I could make enough sense of:

/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.2/lib/active_record/migration.rb:1454:in `migrate'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.2/lib/active_record/migration.rb:1261:in `up'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.2/lib/active_record/migration.rb:1236:in `migrate'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.2/lib/active_record/tasks/database_tasks.rb:270:in `migrate'
/var/www/discourse/lib/tasks/db.rake:267:in `block (2 levels) in <main>'
/var/www/discourse/lib/distributed_mutex.rb:53:in `block in synchronize'
/var/www/discourse/lib/distributed_mutex.rb:49:in `synchronize'
/var/www/discourse/lib/distributed_mutex.rb:49:in `synchronize'
/var/www/discourse/lib/distributed_mutex.rb:34:in `synchronize'
/var/www/discourse/lib/tasks/db.rake:242:in `block in <main>'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/rake-13.3.0/exe/rake:27:in `<top (required)>'
/usr/local/bin/bundle:25:in `load'
/usr/local/bin/bundle:25:in `<main>'
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
I, [2025-07-29T20:36:15.074727 #1]  INFO -- : == 20180828095129 PushFixTopicEmbedAuthorsJob: migrating ======================
== 20180828095129 PushFixTopicEmbedAuthorsJob: migrated (0.0021s) =============


I, [2025-07-29T20:36:15.075331 #1]  INFO -- : Terminating async processes
I, [2025-07-29T20:36:15.075355 #1]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/15/bin/postmaster -D /etc/postgresql/15/main pid: 45
I, [2025-07-29T20:36:15.075376 #1]  INFO -- : Sending TERM to exec chpst -u redis -U redis /usr/bin/redis-server /etc/redis/redis.conf pid: 112
2025-07-29 20:36:15.075 UTC [45] LOG:  received fast shutdown request
112:signal-handler (1753821375) Received SIGTERM scheduling shutdown...
2025-07-29 20:36:15.091 UTC [45] LOG:  aborting any active transactions
2025-07-29 20:36:15.092 UTC [45] LOG:  background worker "logical replication launcher" (PID 59) exited with exit code 1
2025-07-29 20:36:15.092 UTC [54] LOG:  shutting down
2025-07-29 20:36:15.105 UTC [54] LOG:  checkpoint starting: shutdown immediate
112:M 29 Jul 2025 20:36:15.125 # User requested shutdown...
112:M 29 Jul 2025 20:36:15.125 * Saving the final RDB snapshot before exiting.
112:M 29 Jul 2025 20:36:15.247 * DB saved on disk
112:M 29 Jul 2025 20:36:15.247 # Redis is now ready to exit, bye bye...
2025-07-29 20:36:15.273 UTC [54] LOG:  checkpoint complete: wrote 10 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.066 s, sync=0.046 s, total=0.181 s; sync files=9, longest=0.026 s, average=0.006 s; distance=36 kB, estimate=36 kB
2025-07-29 20:36:15.276 UTC [45] LOG:  database system is shut down


FAILED
--------------------
Pups::ExecError: cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate' failed with return #<Process::Status: pid 635 exit 1>
Location of failure: /usr/local/lib/ruby/gems/3.3.0/gems/pups-1.3.0/lib/pups/exec_command.rb:131:in `spawn'
exec failed with the params {"cd"=>"$home", "tag"=>"migrate", "hook"=>"db_migrate", "cmd"=>["su discourse -c 'bundle exec rake db:migrate'"]}
bootstrap failed with exit code 1
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

Interestingly this is the migration

It is following a pattern we really should not be following.

Will get someone to have a look at this!

1 Like

So it tried a migration for a plugin which was not present before and is not enabled? Any idea why it succeeded after multiple rebuild attempts?

Wait it is all working now?

I am somewhat confused about why it failed, but I can see the problem here, we are calling application code from a migration which is very fragile.

Yes, as said, after multiple rebuild attempts, it finally went through. In the last successful log, I do not see this particular migration done anymore at all. Probably it was marked as “migrated” despite the failure, hence just not attempted anymore on next rebuild?

The migration just set a key on redis, a potential reason for it failing is that somehow the code for the rss plugin was not loaded at the time, we will track this down so other people don’t experience this.

2 Likes

Will correct the underlying issue here.

2 Likes