Can't rebuild container, "null value in column "score" violates not-null constraint"

After running /var/discourse/launcher rebuild app it looks like there’s an old “corrupted”/broken entry in the database:

I, [2019-03-29T11:29:05.642432 #14]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate'
2019-03-29 11:29:17.018 UTC [435] discourse@discourse ERROR:  null value in column "score" violates not-null constraint
2019-03-29 11:29:17.018 UTC [435] discourse@discourse DETAIL:  Failing row contains (2147, ReviewableFlaggedPost, 4, 2554, f, null, null, 72, 4721, null, f,
 23604, Post, 2554, {}, 0, null, 2016-07-07 15:27:19.377128, 2016-07-07 15:27:19.377128).
2019-03-29 11:29:17.018 UTC [435] discourse@discourse STATEMENT:  UPDATE reviewables SET score = (
          SELECT SUM(score)
          FROM reviewable_scores
          WHERE reviewable_scores.reviewable_id = reviewables.id
        )

rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

PG::NotNullViolation: ERROR:  null value in column "score" violates not-null constraint
DETAIL:  Failing row contains (2147, ReviewableFlaggedPost, 4, 2554, f, null, null, 72, 4721, null, f, 23604, Post, 2554, {}, 0, null, 2016-07-07 15:27:19.3
77128, 2016-07-07 15:27:19.377128).
: UPDATE reviewables SET score = (
  SELECT SUM(score)
  FROM reviewable_scores
  WHERE reviewable_scores.reviewable_id = reviewables.id
)

This is causing the bootstrap process to fail, leaving me with a broken Discourse instance.

How can I fix this? e.g. how can I get into the database to fix or remove this row?

As part of the rebuild step the old app container was destroyed; can I fix this without a running container with PostgreSQL10? The Ubuntu 16.04 host OS has Postgres 9.5 in the repo, would this still be able to open the database if I install it?

Edit:

The older/existing docker image is still present:

$ docker images
REPOSITORY            TAG                 IMAGE ID            CREATED             SIZE
local_discourse/app   latest              403b9aa4125e        4 days ago          2.37GB
discourse/base        2.0.20190321-0122   7db99586b5b5        8 days ago          1.97GB
discourse/base        2.0.20190217        9c24db193c37        5 weeks ago         1.92GB

This should mean it’s possible to start a new container with that image, which should at least let me get in to fix the database…

Edit 2:

After editing launcher to have rebuild) skip set_existing_container and run_bootstrap, rebuild app got the forum back up. I thought I’d tried start app and it complained the configuration didn’t exist, but I don’t know why there would be a difference… In any event, I have a running container.

2 Likes

@jonathon do you have any non-official plugins installed?

@eviltrout any ideas?

Yes, but disabling those doesn’t allow the bootstrap to complete.

I suspect something went wrong with user deletion - maybe someone deleted a user directly from the database in the “wrong way” - and that’s left a broken row.

Although, the date in the row is very near the birth of our Discourse forum (April 2016) so that’s not likely to have been the cause. We also didn’t have custom plugins installed at that point.

It might also explain why dealing with Flags has been throwing errors for the past few weeks.

Edit:

Oh. It might be because of

https://github.com/angusmcleod/discourse-moderator-extension

That might be doing something with the table… and when enabled it causes bootstrap to fail earlier:

https://github.com/angusmcleod/discourse-moderator-extension/issues/1

Edit:

I’m also wondering whether some database migrations haven’t actually been run…

https://github.com/discourse/discourse/commit/b58867b6e936a5247304e9f06f827cf5012a92ed

I don’t think migrations can be partially run… so the container base image shouldn’t be using the wrong database schema…

500 error when logging in as a Staff user:

ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR:  relation "queued_posts" does not exist
LINE 8:                WHERE a.attrelid = '"queued_posts"'::regclass
                                          ^
:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                     pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
                     c.collname, col_description(a.attrelid, a.attnum) AS comment
                FROM pg_attribute a
                LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                LEFT JOIN pg_type t ON a.atttypid = t.oid
                LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
               WHERE a.attrelid = '"queued_posts"'::regclass
                 AND a.attnum > 0 AND NOT a.attisdropped
               ORDER BY a.attnum
)
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/rack-mini-profiler-1.0.2/lib/patches/db/pg.rb:110:in `async_exec'
Failed to handle exception in exception app middleware : PG::UndefinedTable: ERROR:  relation "queued_posts" does not exist
LINE 8:                WHERE a.attrelid = '"queued_posts"'::regclass
                                          ^
:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                     pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
                     c.collname, col_description(a.attrelid, a.attnum) AS comment
                FROM pg_attribute a
                LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                LEFT JOIN pg_type t ON a.atttypid = t.oid
                LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
               WHERE a.attrelid = '"queued_posts"'::regclass
                 AND a.attnum > 0 AND NOT a.attisdropped
               ORDER BY a.attnum

Edit:

After a final rebuild the above 500 is resolved.

My fix for the initial issue:

# /var/discourse/launcher enter app
app# su - postgres
app# pgsql discourse
discourse=# DELETE FROM reviewables WHERE id = 2147;
discourse=# \q

and rebuild one more time.

There was no corresponding reviewables_scores.reviewable_id so hopefully there are no dangling keys anywhere else…

2 Likes

We’ve fixed the issue in

https://github.com/discourse/discourse/commit/c66ae190375129aa75e0372bc4a20da1058ebc96

But we’re following up if we missed something to be migrated.

9 Likes

I took a look at this and confirmed that the migrations do create reviewable scores for every flag. However, it’ll only do this is a PostAction record exists. There might be some case where those were not present in Discourse. Maybe there were migrations, or an old bug, or something or other.

I think @joffreyjaffeux’s fix is good and we haven’t seen it since so I can consider this complete.

6 Likes