迁移错误:`ActiveRecord::NotNullViolation: PG::NotNullViolation: ERROR: column "private_message" contains null values`

我正在对网站进行升级,但升级失败了,报错为 PG::NotNullViolation: ERROR: column "private_message" contains null values。我找不到 private_message 列在哪里。没有安装任何非官方插件。

哦,等等,这里是相关信息:

(See full trace by running task with --trace)                                                                                            
I, [2020-08-18T18:19:13.253667 #1]  INFO -- : == 20200818084329 UpdatePrivateMessageOnPostSearchData: migrating ============= 
-- execute("DELETE FROM post_search_data\nWHERE post_id IN (\n  SELECT posts.id\n  FROM posts\n  LEFT JOIN topics ON topics.id = posts.topic_id\n  WHERE topics.id IS NULL\n)\n")
   -> 21.9072s                                                                                                       
-- execute("DELETE FROM post_search_data\nWHERE post_id IN (\n  SELECT post_search_data.post_id\n  FROM post_search_data\n  LEFT JOIN posts ON posts.id = post_search_data.post_id\n  WHERE posts.id IS NULL\n)\n")                                                                       
   -> 47.2663s
-- execute("UPDATE post_search_data\nSET private_message = true\nFROM posts\nINNER JOIN topics ON topics.id = posts.topic_id AND topics.archetype = 'private_message'\nWHERE posts.id = post_search_data.post_id\n")                                                                      
   -> 107.2137s             
-- execute("UPDATE post_search_data\nSET private_message = false\nFROM posts\nINNER JOIN topics ON topics.id = posts.topic_id AND topics.archetype <> 'private_message'\nWHERE posts.id = post_search_data.post_id\n")
   -> 834.3738s                                                                                                                              
-- change_column_null(:post_search_data, :private_message, false)

@tgxworld,我觉得这可能是由于以下提交导致的:

嗯,这有点奇怪……你能在 Rails 控制台中运行以下查询并把结果提供给我吗?

DB.query_single(<<~SQL)
SELECT COUNT(*) FROM post_search_data
SQL

DB.query_single(<<~SQL)
SELECT COUNT(*)
FROM post_search_data
LEFT JOIN posts ON posts.id = post_search_data.post_id
LEFT JOIN topics ON topics.id = posts.topic_id 
WHERE topics.id IS NULL
SQL

DB.query_single(<<~SQL)
SELECT COUNT(*)
FROM post_search_data
LEFT JOIN posts ON posts.id = post_search_data.post_id
WHERE posts.id IS NULL
SQL

DB.query_single(<<~SQL)
SELECT COUNT(*)
FROM post_search_data
INNER JOIN posts ON posts.id = post_search_data.post_id
INNER JOIN topics ON topics.id = posts.topic_id
SQL
[2] pry(main)> 
[3] pry(main)> DB.query_single(<<~SQL)
[3] pry(main)* SELECT COUNT(*) FROM post_search_data
[3] pry(main)* SQL
=> [2200178]
[4] pry(main)> 
[5] pry(main)> DB.query_single(<<~SQL)
[5] pry(main)* SELECT COUNT(*)
[5] pry(main)* FROM post_search_data
[5] pry(main)* LEFT JOIN posts ON posts.id = post_search_data.post_id
[5] pry(main)* LEFT JOIN topics ON topics.id = posts.topic_id 
[5] pry(main)* WHERE topics.id IS NULL
[5] pry(main)* SQL

=> [39]
[6] pry(main)> 
[7] pry(main)> DB.query_single(<<~SQL)
[7] pry(main)* SELECT COUNT(*)
[7] pry(main)* FROM post_search_data
[7] pry(main)* LEFT JOIN posts ON posts.id = post_search_data.post_id
[7] pry(main)* WHERE posts.id IS NULL
[7] pry(main)* SQL
=> [0]
[8] pry(main)> 
[9] pry(main)> DB.query_single(<<~SQL)
[9] pry(main)* SELECT COUNT(*)
[9] pry(main)* FROM post_search_data
[9] pry(main)* INNER JOIN posts ON posts.id = post_search_data.post_id
[9] pry(main)* INNER JOIN topics ON topics.id = posts.topic_id
[9] pry(main)* SQL

上次计数似乎未执行。

抱歉。我每次点击复制时,终端换行符总是丢失。

[2] pry(main)> 
[3] pry(main)> DB.query_single(<<~SQL)
[3] pry(main)* SELECT COUNT(*)
[3] pry(main)* FROM post_search_data
[3] pry(main)* INNER JOIN posts ON posts.id = post_search_data.post_id
[3] pry(main)* INNER JOIN topics ON topics.id = posts.topic_id
[3] pry(main)* SQL
=> [2200797]

抱歉,您不得不将所有查询一起运行 :slight_smile: 否则,新创建的帖子会导致计数不准确。

当你这么说时,这太明显了……而且只运行一次查询反而更费工夫!

[1] pry(main)> DB.query_single(<<~SQL)
[1] pry(main)* SELECT COUNT(*) FROM post_search_data
[1] pry(main)* SQL
=> [2200995]
[2] pry(main)> 
[3] pry(main)> DB.query_single(<<~SQL)
[3] pry(main)* SELECT COUNT(*)
[3] pry(main)* FROM post_search_data
[3] pry(main)* LEFT JOIN posts ON posts.id = post_search_data.post_id
[3] pry(main)* LEFT JOIN topics ON topics.id = posts.topic_id 
[3] pry(main)* WHERE topics.id IS NULL
[3] pry(main)* SQL
=> [39]
[4] pry(main)> 
[5] pry(main)> DB.query_single(<<~SQL)
[5] pry(main)* SELECT COUNT(*)
[5] pry(main)* FROM post_search_data
[5] pry(main)* LEFT JOIN posts ON posts.id = post_search_data.post_id
[5] pry(main)* WHERE posts.id IS NULL
[5] pry(main)* SQL
=> [0]
[6] pry(main)> 
[7] pry(main)> DB.query_single(<<~SQL)
[7] pry(main)* SELECT COUNT(*)
[7] pry(main)* FROM post_search_data
[7] pry(main)* INNER JOIN posts ON posts.id = post_search_data.post_id
[7] pry(main)* INNER JOIN topics ON topics.id = posts.topic_id
[7] pry(main)* SQL
=> [2200956]
[8] pry(main)> 

@pfaffman 只是想确认一下,这个问题是否已按照我们产品经理的要求解决?

抱歉,Alan。是的,这个问题已经解决了。感谢你的帮助!

对于其他遇到同样问题的用户,如果你执行普通的 ./launcher rebuild app 命令,通常不会遇到什么麻烦,唯一的例外是你的网站可能会在重建过程中的引导阶段(bootstrap)因数据库迁移而长时间处于不可用状态。这是最安全且简单的选择,除非你使用的是双容器安装,否则你本来就会这样做。

我不希望整个引导过程都导致网站停机。对于这个大型论坛(500 万帖子,日均约 5 万次页面访问?),我的解决方案(多亏了 Alan 的帮助才想到)是:先关闭“升级后迁移”选项进行引导(此时迁移几乎不耗时),启动新容器后再执行升级后迁移(迁移耗时约 20 到 40 分钟——我当时并没有太关注时间)。

如果其他人感兴趣,我后来了解到使用 docker_manager 进行升级会更加顺畅,因此如果其他人也有大型论坛且遇到引导问题,我会推荐这种方法。