无法恢复,因为索引损坏(附处理损坏索引的一些线索)

此备份因以下错误多次失败:

ERROR: 无法创建唯一索引 "index_incoming_referers_on_path_and_incoming_domain_id"
DETAIL: 键 (path, incoming_domain_id)=(/@dataandme, 41) 已重复。
ERROR: 当前事务已中止,在事务块结束前将忽略所有命令

我手动在 Rails 中删除了所有相关记录,使用了类似以下的命令:
IncomingReferer.where(path: ‘/@dataandme’).destroy_all

但问题依然存在,恢复过程仍然失败:

...
[2019-12-30 20:29:16] 'pfaffman' 已开始恢复!
[2019-12-30 20:29:16] 将恢复状态标记为运行中...
[2019-12-30 20:29:16] 确认目录 /var/www/discourse/tmp/restores/default/2019-12-30-202916 是否存在...
[2019-12-30 20:29:16] 正在下载归档文件到临时目录...
[2019-12-30 20:29:38] 无元数据文件可提取。
[2019-12-30 20:29:38] 正在验证元数据...
[2019-12-30 20:29:38]   当前版本:20191129144706
[2019-12-30 20:29:38]   待恢复版本:20191129144706
[2019-12-30 20:29:38] 正在提取转储文件...
[2019-12-30 20:30:07] 在 discourse_functions 模式中创建缺失的函数
[2019-12-30 20:30:08] 无法恢复到不同的模式,将在原地恢复
[2019-12-30 20:30:08] 正在启用只读模式...
[2019-12-30 20:30:08] 正在暂停 Sidekiq...
[2019-12-30 20:30:08] 等待 Sidekiq 完成当前运行的任务...
[2019-12-30 20:30:08] 正在恢复转储文件...(可能需要较长时间)

 ----- 以此类推 -----
[2019-12-30 20:33:17] ERROR: 当前事务已中止,在事务块结束前将忽略所有命令
[2019-12-30 20:33:17] ERROR: 当前事务已中止,在事务块结束前将忽略所有命令
[2019-12-30 20:33:17] EXCEPTION: psql 执行失败
[2019-12-30 20:33:17] /var/www/discourse/lib/backup_restore/restorer.rb:331:in `restore_dump'
/var/www/discourse/lib/backup_restore/restorer.rb:75:in `run'
/var/www/discourse/lib/backup_restore.rb:166:in `block in start!'
/var/www/discourse/lib/backup_restore.rb:163:in `fork'
/var/www/discourse/lib/backup_restore.rb:163:in `start!'
/var/www/discourse/lib/backup_restore.rb:22:in `restore!'
/var/www/discourse/app/controllers/admin/backups_controller.rb:119:in `restore'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/abstract_controller/base.rb:196:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/rendering.rb:30:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/abstract_controller/callbacks.rb:42:in `block in process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/callbacks.rb:135:in `run_callbacks'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/abstract_controller/callbacks.rb:41:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/rescue.rb:22:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/instrumentation.rb:33:in `block in process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/notifications.rb:180:in `block in instrument'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/notifications.rb:180:in `instrument'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/instrumentation.rb:32:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/params_wrapper.rb:245:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/railties/controller_runtime.rb:27:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/abstract_controller/base.rb:136:in `process'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionview-6.0.1/lib/action_view/rendering.rb:39:in `process'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-mini-profiler-1.1.3/lib/mini_profiler/profiling_methods.rb:104:in `block in profile_method'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal.rb:191:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal.rb:252:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/action_dispatch/routing/route_set.rb:51:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/action_dispatch/routing/route_set.rb:33:in `serve'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/action_dispatch/routing/mapper.rb:18:in `block in <class:Constraints>'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/action_dispatch/routing/mapper.rb:48:in `serve'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/action_dispatch/journey/router.rb:49:in `block in serve'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/action_dispatch/journey/router.rb:32:in `each'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/action_dispatch/journey/router.rb:32:in `serve'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/action_dispatch/routing/route_set.rb:837:in `call'
/var/www/discourse/lib/middleware/omniauth_bypass_middleware.rb:68:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/tempfile_reaper.rb:15:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/conditional_get.rb:38:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/head.rb:12:in `call'
/var/www/discourse/lib/content_security_policy/middleware.rb:12:in `call'
/var/www/discourse/lib/middleware/anonymous_cache.rb:318:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/session/abstract/id.rb:232:in `context'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/session/abstract/id.rb:226:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/cookies.rb:648:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/callbacks.rb:27:in `block in call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/callbacks.rb:101:in `run_callbacks'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/callbacks.rb:26:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/actionable_exceptions.rb:17:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/debug_exceptions.rb:32:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/show_exceptions.rb:33:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/logster-2.4.2/lib/logster/middleware/reporter.rb:43:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/railties-6.0.1/lib/rails/rack/logger.rb:38:in `call_app'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/railties-6.0.1/lib/rails/rack/logger.rb:28:in `call'
/var/www/discourse/config/initializers/100-quiet_logger.rb:18:in `call'
/var/www/discourse/config/initializers/100-silence_logger.rb:31:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/remote_ip.rb:81:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/request_id.rb:27:in `call'
/var/www/discourse/lib/middleware/enforce_hostname.rb:17:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/method_override.rb:22:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/executor.rb:14:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/sendfile.rb:111:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/host_authorization.rb:77:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-mini-profiler-1.1.3/lib/mini_profiler/profiler.rb:296:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/message_bus-2.2.3/lib/message_bus/rack/middleware.rb:57:in `call'
/var/www/discourse/lib/middleware/request_tracker.rb:181:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/railties-6.0.1/lib/rails/engine.rb:526:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/railties-6.0.1/lib/rails/railtie.rb:190:in `public_send'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/railties-6.0.1/lib/rails/railtie.rb:190:in `method_missing'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/urlmap.rb:68:in `block in call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/urlmap.rb:53:in `each'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/urlmap.rb:53:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.1/lib/unicorn/http_server.rb:605:in `process_client'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.1/lib/unicorn/http_server.rb:700:in `worker_loop'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.1/lib/unicorn/http_server.rb:548:in `spawn_missing_workers'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.1/lib/unicorn/http_server.rb:144:in `start'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.1/bin/unicorn:128:in `<top (required)>'
/var/www/discourse/vendor/bundle/ruby/2.6.0/bin/unicorn:23:in `load'
/var/www/discourse/vendor/bundle/ruby/2.6.0/bin/unicorn:23:in `<main>'
[2019-12-30 20:33:17] 正在尝试回滚...
[2019-12-30 20:33:17] 正在回滚...
[2019-12-30 20:33:17] 正在清理临时数据...
[2019-12-30 20:33:17] 从 discourse_functions 模式中删除函数
[2019-12-30 20:33:17] 删除临时目录 '/var/www/discourse/tmp/restores/default/2019-12-30-202916'...
[2019-12-30 20:33:17] 恢复 Sidekiq...
[2019-12-30 20:33:17] 禁用只读模式...
[2019-12-30 20:33:17] 将恢复状态标记为完成...
[2019-12-30 20:33:17] 通知 'pfaffman' 恢复已结束...
[2019-12-30 20:33:34] 完成!

现在我不知道还有什么可以修复的了。

我可以升级正在尝试恢复的目标站点(测试环境),但更希望在确认测试环境运行正常之前,不要升级生产环境。

您确定 psql 在此错误之前没有记录其他错误吗?这通常是由于之前的 SQL 命令失败导致的。

另外,为什么原始数据库的 incoming_referers 表中会有重复记录?该表的唯一索引自 2014 年就已存在。也许您尝试恢复的数据库存在比这更严重的问题?

谢谢,Gerhard!非常感谢你的帮助。这个问题变得越来越奇怪了。

对我来说,incoming_referers 出问题也说不通。我最近一次从生产环境恢复到这个测试站点是在 10 月。

看起来问题出在数据库或索引上。我怀疑是某个插件导致的,但错误涉及这么多不同的索引,不太可能是单个插件造成的。之前用 grep 在插件中搜索与损坏索引相关的问题,结果一无所获。

有一次恢复时,我得到了以下错误:

[2020-01-03 16:25:30] ERROR:  could not create unique index "index_plugin_store_rows_on_plugin_name_and_key"

我执行了:

[10] pry(main)> PluginStoreRow.where(plugin_name: 'discourse-data-explorer',key: 'q:-8').destroy_all

然后再次尝试:

[2020-01-03 16:49:16] ERROR:  could not create unique index "index_tags_on_lower_name"
[2020-01-03 16:49:16] DETAIL:  Key (lower(name::text))=(addins) is duplicated.

以下是最新恢复过程的完整头部日志:

[2020-01-03 16:41:34] 'pfaffman' has started the restore!
[2020-01-03 16:41:34] Marking restore as running...
[2020-01-03 16:41:34] Making sure /var/www/discourse/tmp/restores/default/2020-01-03-164133 exists...
[2020-01-03 16:41:34] Downloading archive to tmp directory...
[2020-01-03 16:45:20] No metadata file to extract.
[2020-01-03 16:45:20] Validating metadata...
[2020-01-03 16:45:20]   Current version: 20191220134101
[2020-01-03 16:45:20]   Restored version: 20191129144706
[2020-01-03 16:45:20] Extracting dump file...
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] ERROR:  could not create unique index "index_tags_on_lower_name"
[2020-01-03 16:49:16] DETAIL:  Key (lower(name::text))=(addins) is duplicated.
[2020-01-03 16:49:16] ERROR:  current transaction is aborted, commands ignored until end of transaction block
[2020-01-03 16:49:16] ERROR:  current transaction is aborted, commands ignored until end of transaction block
[2020-01-03 16:49:16] ERROR:  current transaction is aborted, commands ignored until end of transaction block

我想,在恢复功能正常工作并能够创建所有唯一索引之前,您可能需要手动清理数据。

除此之外,我想知道我们是否在最近的某个基础镜像中发布了一个有缺陷的 PostgreSQL 版本。过去一两个月里,关于索引损坏问题的报告相当多。这似乎是一个奇怪的巧合。:thinking:

我希望得到的解释不仅仅是“杰伊又干了蠢事”。:wink:

顺便提一下,我相当确定我最近在这个问题开始之前,刚刚升级了本站的数据容器。

您可以查看重复记录的 created_at 日期,尝试找出当时使用的是哪个版本的 Postgres。@falco 提到 PG 10.0 和 10.1 存在索引损坏的问题,但那已经是很久以前的事了(10.2 于 2018 年 2 月 8 日发布),而我们直到 2018 年 4 月才开始提供 PG 10(即在 10.2 发布之后)。因此,这要么是 PG 中的一个新漏洞,要么只是运气不好。:man_shrugging:

嗯。好吧,生产环境是 10.10,而预发布环境是 10.11。

另外,日志显示因存在重复而失败的几条记录中,实际上只有一条记录。也许该记录已被删除,但并未从索引中同步删除?这样理解对吗?

现在,索引并不是在每次更新时都重新创建的。

是的,因为它们在磁盘上是不同的对象。

好吧,这里有个线索:

discourse=> reindex table tags;
ERROR:  could not create unique index "index_tags_on_lower_name"
DETAIL:  Key (lower(name::text))=(addins) is duplicated.

discourse=> select * from tags where name='addins';
 id  |  name  | topic_count |         created_at         |         updated_at         | pm_topic_count 
-----+--------+-------------+----------------------------+----------------------------+----------------
 143 | addins |          16 | 2017-11-13 20:08:51.345607 | 2017-11-13 20:08:51.345607 |              0
(1 row)

discourse=> select * from tags where name like '%dins';
 id  |  name  | topic_count |         created_at         |         updated_at         | pm_topic_count 
-----+--------+-------------+----------------------------+----------------------------+----------------
 143 | addins |          16 | 2017-11-13 20:08:51.345607 | 2017-11-13 20:08:51.345607 |              0
 721 | addins |           3 | 2019-11-19 10:01:32.406178 | 2019-11-19 10:01:32.406178 |              0

我看不出这两者有什么区别,但第一个查询只找到了其中一个?

如果你运行 EXPLAIN ANALYZE select * from tags where name='addins';,你就会明白原因。

这可能是索引仅扫描(我怀疑不太可能,因为你选择了太多列),或者是位图堆扫描,即利用索引来确定要选取哪些行。由于索引已损坏……

这正好是我最近遇到的损坏索引问题。我最终检查了这些账户,发现它们通常是相同的(相同的电子邮件地址),重命名其中一个然后合并它们。这花了不少功夫,因为大部分是手动操作,但现在我有了一个崭新的索引 :slight_smile:

编辑:哎呀,我的问题出在 users 表中。不过同样是重复键的问题。

这个案例中也有两条记录,其中’shiny-server’只匹配了一条,而’%hiney-server’却匹配了两条。我不太明白这如何帮助我理解,但我想你的意思是,我使用的select语句期望只返回一个结果,并从索引中提取它,而忽略其他结果;但当我使用%时,它会在字段中进行搜索。

discourse=> EXPLAIN ANALYZE select * from tags where name='shiny-server'; 
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Index Scan using index_tags_on_name on tags  (cost=0.28..8.29 rows=1 width=36) (actual time=0.038..0.040 rows=1 loops=1)
   Index Cond: ((name)::text = 'shiny-server'::text)
 Planning time: 0.129 ms
 Execution time: 0.070 ms
(4 rows)

我怀疑,那些无法自行找出问题的人可能觉得这没什么用,不过……我确实尝试了找出两个具有相同名称的tag_id,并执行类似以下操作:

 TopicTag.where(tag_id: 717).update_all(tag_id: 611)
 Tag.ensure_consistency!
 Tag.find(717) # 确保它不再属于任何主题
 Tag.find(717).destroy

然后在 psql 中执行reindex table tags;

现在看起来,我还需要对users表进行类似操作,正如@bartv 所描述的那样。不过在我的用户中,我看到了一个名为 David 的用户和一个名为 david 的用户,还有[Mm]ark

我是这样修复这些问题的:

marks=User.where("username similar to  '[Mm]+ark'").pluck(:id,:username,:created_at)

然后在/admin/users中找到新的 Mark 用户并在那里重命名该用户。接着在 psql 中尝试执行reindex table users;(先执行sudo su - discourse,再执行psql)。

可以这样理解:如果你需要在电话簿中查找 Pfaffman,你会先找到 P 部分,然后尝试在 Pe 和 Pg 之间查找条目,接着在 Pfa 和 Pfb 之间查找,依此类推。你利用了条目按字母顺序排列这一特点,从而快速定位目标。现在想象一下(作为一个思想实验):有一位电话簿编辑没有正确记住字母表,导致将某些 Pfaffman 条目放错了位置(即索引损坏)。在这种情况下,你将无法找到它们。

现在,你需要查找所有姓氏以 faffman 结尾的人。这种情况下没有快速的方法,你必须翻阅整本电话簿。这需要大量工作!但是……通过这种方式,你确实能找到那些被错误插入的 Pfaffman 条目!