Migrate:multisite fails on second site when migrating from scratch

jamesh · February 7, 2019, 3:20am

I’ve just started to spin up our discourse instance, and when using multisite:migrate it failed on the second database:

https://gist.github.com/wryun/e1e38f7065e75d46edde10d399bd194c

This may be caused by the changes in 40fa96777ddad4698df20d6b8a61afcf87743ed7, where a refactoring removed a lot of Discourse.reset_active_record_cache, which would clear the leftover column information from the first migration (it appears to think a whole bunch of columns exist that don’t).

Or maybe I just did something wrong!

For now, I manually changed the failing migration to add a reset_active_record_cache and it succeeded, but I suspect this will recur when I next add a site.

Discourse version used: de47b35b2d096d0fc9ac22e8cb85a38bba39bd70

sam · February 7, 2019, 4:17am

This migration from 2013 is a problem.

github.com/discourse/discourse

db/migrate/20130221215017_add_description_to_categories.rb

1cacfddda


      
          # Migrate excerpts over
          Category.order('id').each do |c|
            post = c.topic.posts.order(:post_number).first
            PostRevisor.new(post).send(:update_category_description)
          end

I am patching it so it is less obnoxious:

Can you re-test? There may be more of these which we need to patch

codinghorror · February 7, 2019, 4:23am

Why are we carrying migrations from six years ago?

sam · February 7, 2019, 4:42am

That is how it works … in fact there is no simple tool for flattening migrations in existence afaik, it would be brand new territory for us. If we did not add the description column on the category table in 2013 it simply would not be there.

We are an old project now and it may make sense to fold a big pile of migrations into one but we are talking weeks of high risk work to get it perfectly right.

michaeld · February 7, 2019, 5:37am

You don’t want to do that. It’s a LOT of work and it would bring you nothing. The beauty of the RoR migrations mechanism is that you can upgrade /any/ version of Discourse to the latest. That would break as well.

codinghorror · February 7, 2019, 5:50am

Pointless; statistically nobody is running Discourse 1.0 out there. We know because we see the version pings…

michaeld · February 7, 2019, 5:54am

Maybe not running - but last month we did upgrade someone with an archived 1.5 export which they kept in a drawer on the attic. And the beauty is that we just could do that.

codinghorror · February 7, 2019, 10:12am

Yes but 1.5 is years ahead of 1.0. We see the version ping metrics so we know what’s in use. And trust me, 1.0 is … not… in use.

sam · February 7, 2019, 9:42pm

Note, I am not in principle agains making our No database -> Yes database process somewhat faster and take say 200ms instead of 3 seconds like it is today. What I don’t want to do though is spend weeks on making this happen, so many other s to tame.

jamesh · February 7, 2019, 10:04pm

Thanks for your prompt fix! I should get a chance to test this later today, but no guarantees (running on Australian time here).

However, I’m not entirely convinced that patching every migration is the best option - though you almost certainly should ignore me, since I have no Rails experience. Isn’t it a principled decision to reset the cache at the start of every multisite migration to make sure that previous runs can’t interfere? Or - even better - disable the cache entirely for the migration (possible?), or somehow make every migration entirely separate? It also seems particularly dangerous to me to patch ‘old’ migrations that you know have worked properly (for some value of properly); you really don’t want to make a mistake…

The other fun thing with the current solution is that a random migration is resetting the cache in certain paths, which could easily cause inconsistent failures (i.e. you have old category data, you get the cache reset, everything works later, and if you don’t it fails…).

sam · February 7, 2019, 10:34pm

We got to catch every occurrence of corruption of our schema cache, cause its like a little timebomb waiting to explode.

migration 1 … corrupts schema cache
3 years pass
migration N … incorrectly uses active record models (which is not something we should be doing) and then explodes.

The best thing is simply never ever to touch AR models in migrations. I would be far more open to a fix that hard bans that from migrations and deal with the pain of fixing this in the existing migrations we have.

jamesh · February 7, 2019, 11:01pm

True, a hard ban would be nice, but in practice I think what happened here is more likely to recur: that it works fine to spin the first db from scratch (more likely to be exercised by tests/caught by the first person attempting the migration?), but the multisite migration is corrupted (and it seems clear that this is not run frequently by either tests or users…). Note that the multisite migration fails even if the first migration is completely finished, so I assume there is a source of cache corruption outside actual migration steps.

EDIT: will it ever fail in non-multisite? I would have assumed that ActiveRecord would correctly detect changes in column information caused via the connection (rather than db switches, which seem inherently unexpected)?

jamesh · February 12, 2019, 3:22am

Ran the multisite:migrate with that patched in. Worked fine, so no other bad activerecord uses, I guess? Thanks.

sam · February 12, 2019, 3:38am

The more we use bad patterns the higher the risk that it will fail, even in a single site.

Looks like we are safe for now, so closing… thanks for reporting!

Topic		Replies	Views
ActiveRecord::StatementInvalid: PG::UndefinedColumn: ERROR: column categories.search_priority does not exist Bug	19	4381	March 25, 2019
Multisite upgrade failed due to missing database columns Bug	11	1889	September 13, 2017
Rake multisite:migrate is broken Bug	5	1316	June 23, 2020
DuplicateColumn and multisite:migrate failed Bug	6	797	April 19, 2022
Multi-instance migration failed when adding a new instance Bug	6	453	October 28, 2023

Migrate:multisite fails on second site when migrating from scratch

Related topics