Can't restore due to corrupt indexes (with some clues on how to deal with corrupt indexes)

This backup failed several times because of errors like

ERROR:  could not create unique index "index_incoming_referers_on_path_and_incoming_domain_id"
DETAIL:  Key (path, incoming_domain_id)=(/@dataandme, 41) is duplicated.               
ERROR:  current transaction is aborted, commands ignored until end of transaction block

I went and removed all of those records by hand in rails with stuff like
IncomingReferer.where(path: ‘/@dataandme’).destroy_all

Now it’s still failing and I

...
[2019-12-30 20:29:16] 'pfaffman' has started the restore!
[2019-12-30 20:29:16] Marking restore as running...
[2019-12-30 20:29:16] Making sure /var/www/discourse/tmp/restores/default/2019-12-30-202916 exists...
[2019-12-30 20:29:16] Downloading archive to tmp directory...
[2019-12-30 20:29:38] No metadata file to extract.
[2019-12-30 20:29:38] Validating metadata...
[2019-12-30 20:29:38]   Current version: 20191129144706
[2019-12-30 20:29:38]   Restored version: 20191129144706
[2019-12-30 20:29:38] Extracting dump file...
[2019-12-30 20:30:07] Creating missing functions in the discourse_functions schema
[2019-12-30 20:30:08] Cannot restore into different schema, restoring in-place
[2019-12-30 20:30:08] Enabling readonly mode...
[2019-12-30 20:30:08] Pausing sidekiq...
[2019-12-30 20:30:08] Waiting for sidekiq to finish running jobs...
[2019-12-30 20:30:08] Restoring dump file... (can be quite long)

 ----- and so on -----
[2019-12-30 20:33:17] ERROR:  current transaction is aborted, commands ignored until end of transaction block
[2019-12-30 20:33:17] ERROR:  current transaction is aborted, commands ignored until end of transaction block
[2019-12-30 20:33:17] EXCEPTION: psql failed
[2019-12-30 20:33:17] /var/www/discourse/lib/backup_restore/restorer.rb:331:in `restore_dump'
/var/www/discourse/lib/backup_restore/restorer.rb:75:in `run'
/var/www/discourse/lib/backup_restore.rb:166:in `block in start!'
/var/www/discourse/lib/backup_restore.rb:163:in `fork'
/var/www/discourse/lib/backup_restore.rb:163:in `start!'
/var/www/discourse/lib/backup_restore.rb:22:in `restore!'
/var/www/discourse/app/controllers/admin/backups_controller.rb:119:in `restore'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/abstract_controller/base.rb:196:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/rendering.rb:30:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/abstract_controller/callbacks.rb:42:in `block in process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/callbacks.rb:135:in `run_callbacks'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/abstract_controller/callbacks.rb:41:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/rescue.rb:22:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/instrumentation.rb:33:in `block in process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/notifications.rb:180:in `block in instrument'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/notifications.rb:180:in `instrument'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/instrumentation.rb:32:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal/params_wrapper.rb:245:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/railties/controller_runtime.rb:27:in `process_action'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/abstract_controller/base.rb:136:in `process'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionview-6.0.1/lib/action_view/rendering.rb:39:in `process'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-mini-profiler-1.1.3/lib/mini_profiler/profiling_methods.rb:104:in `block in profile_method'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal.rb:191:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_controller/metal.rb:252:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/routing/route_set.rb:51:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/routing/route_set.rb:33:in `serve'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/routing/mapper.rb:18:in `block in <class:Constraints>'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/routing/mapper.rb:48:in `serve'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/journey/router.rb:49:in `block in serve'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/journey/router.rb:32:in `each'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/journey/router.rb:32:in `serve'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/routing/route_set.rb:837:in `call'
/var/www/discourse/lib/middleware/omniauth_bypass_middleware.rb:68:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/tempfile_reaper.rb:15:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/conditional_get.rb:38:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/head.rb:12:in `call'
/var/www/discourse/lib/content_security_policy/middleware.rb:12:in `call'
/var/www/discourse/lib/middleware/anonymous_cache.rb:318:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/session/abstract/id.rb:232:in `context'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/session/abstract/id.rb:226:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/cookies.rb:648:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/callbacks.rb:27:in `block in call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/callbacks.rb:101:in `run_callbacks'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/callbacks.rb:26:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/actionable_exceptions.rb:17:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/debug_exceptions.rb:32:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/show_exceptions.rb:33:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/logster-2.4.2/lib/logster/middleware/reporter.rb:43:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/railties-6.0.1/lib/rails/rack/logger.rb:38:in `call_app'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/railties-6.0.1/lib/rails/rack/logger.rb:28:in `call'
/var/www/discourse/config/initializers/100-quiet_logger.rb:18:in `call'
/var/www/discourse/config/initializers/100-silence_logger.rb:31:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/remote_ip.rb:81:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/request_id.rb:27:in `call'
/var/www/discourse/lib/middleware/enforce_hostname.rb:17:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/method_override.rb:22:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/executor.rb:14:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/sendfile.rb:111:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/actionpack-6.0.1/lib/action_dispatch/middleware/host_authorization.rb:77:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-mini-profiler-1.1.3/lib/mini_profiler/profiler.rb:296:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/message_bus-2.2.3/lib/message_bus/rack/middleware.rb:57:in `call'
/var/www/discourse/lib/middleware/request_tracker.rb:181:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/railties-6.0.1/lib/rails/engine.rb:526:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/railties-6.0.1/lib/rails/railtie.rb:190:in `public_send'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/railties-6.0.1/lib/rails/railtie.rb:190:in `method_missing'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/urlmap.rb:68:in `block in call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/urlmap.rb:53:in `each'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-2.0.7/lib/rack/urlmap.rb:53:in `call'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.1/lib/unicorn/http_server.rb:605:in `process_client'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.1/lib/unicorn/http_server.rb:700:in `worker_loop'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.1/lib/unicorn/http_server.rb:548:in `spawn_missing_workers'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.1/lib/unicorn/http_server.rb:144:in `start'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/unicorn-5.5.1/bin/unicorn:128:in `<top (required)>'
/var/www/discourse/vendor/bundle/ruby/2.6.0/bin/unicorn:23:in `load'
/var/www/discourse/vendor/bundle/ruby/2.6.0/bin/unicorn:23:in `<main>'
[2019-12-30 20:33:17] Trying to rollback...
[2019-12-30 20:33:17] Rolling back...
[2019-12-30 20:33:17] Cleaning stuff up...
[2019-12-30 20:33:17] Dropping function from the discourse_functions schema
[2019-12-30 20:33:17] Removing tmp '/var/www/discourse/tmp/restores/default/2019-12-30-202916' directory...
[2019-12-30 20:33:17] Unpausing sidekiq...
[2019-12-30 20:33:17] Disabling readonly mode...
[2019-12-30 20:33:17] Marking restore as finished...
[2019-12-30 20:33:17] Notifying 'pfaffman' of the end of the restore...
[2019-12-30 20:33:34] Finished!

Now I don’t see anything that I know how to fix.

I can upgrade the staging site that I’m trying to restore to, but would rather not upgrade the production site without first being able to see that things work on staging. . .

2 Likes

Are you sure that psql didn’t log any other error before that? This usually happens when a previous SQL command failed.

Also, why does the original database has duplicate records in the incoming_referers table? The unique index in that table exists since 2014. Maybe the database you are trying to restore has a lot more problems than this?

3 Likes

Thanks Gerhard! I appreciate your help. This one is just getting stranger and stranger.

It doesn’t make sense to me that there was a problem with incoming_referers to me either. I’ve done a restore from production to this staging site as recently as October.

It seems like there’s something wrong with the database or the indexes. I suspected a plugin, but the errors are in enough different indexes that it seems improbable that a single plugin might be to blame. An earlier grep through plugins for the broken index issues came up empty.

One restore I got this:

[2020-01-03 16:25:30] ERROR:  could not create unique index "index_plugin_store_rows_on_plugin_name_and_key"

I did:

[10] pry(main)> PluginStoreRow.where(plugin_name: 'discourse-data-explorer',key: 'q:-8').destroy_all

And tried again:

[2020-01-03 16:49:16] ERROR:  could not create unique index "index_tags_on_lower_name"
[2020-01-03 16:49:16] DETAIL:  Key (lower(name::text))=(addins) is duplicated.

Here’s the full head of the latest restore:

[2020-01-03 16:41:34] 'pfaffman' has started the restore!
[2020-01-03 16:41:34] Marking restore as running...
[2020-01-03 16:41:34] Making sure /var/www/discourse/tmp/restores/default/2020-01-03-164133 exists...
[2020-01-03 16:41:34] Downloading archive to tmp directory...
[2020-01-03 16:45:20] No metadata file to extract.
[2020-01-03 16:45:20] Validating metadata...
[2020-01-03 16:45:20]   Current version: 20191220134101
[2020-01-03 16:45:20]   Restored version: 20191129144706
[2020-01-03 16:45:20] Extracting dump file...
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:15] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] CREATE INDEX
[2020-01-03 16:49:16] ERROR:  could not create unique index "index_tags_on_lower_name"
[2020-01-03 16:49:16] DETAIL:  Key (lower(name::text))=(addins) is duplicated.
[2020-01-03 16:49:16] ERROR:  current transaction is aborted, commands ignored until end of transaction block
[2020-01-03 16:49:16] ERROR:  current transaction is aborted, commands ignored until end of transaction block
[2020-01-03 16:49:16] ERROR:  current transaction is aborted, commands ignored until end of transaction block

I guess you will have to clean up the data manually until the restore works and all unique indexes can be created.

That aside, I wonder if we shipped a buggy version of PostgreSQL in one of the recent base images. There were quite a few reports about problems with corrupt indexes in the last month or two. Seems like a strange coincidence. :thinking:

4 Likes

I’d love an explanation that wasn’t just “Jay did something stupid again.” :wink:

FWIW, I’m pretty sure that I recently upgraded the data container on this site just before this started.

2 Likes

You could take a look at the created_at dates of duplicate records and try to find out which version of Postgres was used at that time. @falco mentioned that PG 10.0 and 10.1 had problems with corrupt indexes, but that was a long time ago (10.2 was release on 2018-02-08) and we only started shipping PG 10 after in April 2018 (so after the release of 10.2). So, either it’s a new bug in PG or just bad luck. :man_shrugging:

4 Likes

Hmm. Well, production has 10.10 and staging has 10.11.

Also, a couple of the records that the logs said failed because there was a dupe there was only one record, so maybe the record got deleted but it didn’t also get deleted from the index? Does that make sense?

1 Like

Now, but indexes aren’t recreated on every update.

Yes, as they are different objects in disk.

4 Likes

Well, here’s a clue:

discourse=> reindex table tags;
ERROR:  could not create unique index "index_tags_on_lower_name"
DETAIL:  Key (lower(name::text))=(addins) is duplicated.

discourse=> select * from tags where name='addins';
 id  |  name  | topic_count |         created_at         |         updated_at         | pm_topic_count 
-----+--------+-------------+----------------------------+----------------------------+----------------
 143 | addins |          16 | 2017-11-13 20:08:51.345607 | 2017-11-13 20:08:51.345607 |              0
(1 row)

discourse=> select * from tags where name like '%dins';
 id  |  name  | topic_count |         created_at         |         updated_at         | pm_topic_count 
-----+--------+-------------+----------------------------+----------------------------+----------------
 143 | addins |          16 | 2017-11-13 20:08:51.345607 | 2017-11-13 20:08:51.345607 |              0
 721 | addins |           3 | 2019-11-19 10:01:32.406178 | 2019-11-19 10:01:32.406178 |              0

I can’t tell the difference between those two, but the first select found only one of them?

2 Likes

If you run a EXPLAIN ANALYZE select * from tags where name='addins'; you will find out why.

It was either a index only scan (I doubt because you are selecting too many columns) or a bitmap heap scan, where it uses the index to know which rows to pick. Since the index is corrupt…

3 Likes

This is exactly the problem I recently had with my corrupt index. I ended up inspecting these account, often found they were the same (same email address), renamed one and then merged them. It took quite a bit of work as it was mostly manual, but now I have a shiny new index :slight_smile:

Edit: oops, my problem was in the users table. Same duplicate key issue though.

3 Likes

This one also had two entries where ‘shiny-server’ matched only one but ‘%hiney-server’ matched both. I don’t see how this helps me to understand but I think you’re saying that the select I’m doing expects only one thing to be returned and pulls it from the index, ignoring the other, but when I use the % it searches the fields.

discourse=> EXPLAIN ANALYZE select * from tags where name='shiny-server'; 
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Index Scan using index_tags_on_name on tags  (cost=0.28..8.29 rows=1 width=36) (actual time=0.038..0.040 rows=1 loops=1)
   Index Cond: ((name)::text = 'shiny-server'::text)
 Planning time: 0.129 ms
 Execution time: 0.070 ms
(4 rows)

I rather doubt that anyone who wouldn’t figure this out on their own will find this useful, but … I did a bunch of finding the two tag_ids with the same name and doing stuff like

 TopicTag.where(tag_id: 717).update_all(tag_id: 611)
 Tag.ensure_consistency!
 Tag.find(717) # make sure it's not in any topics
 Tag.find(717).destroy

and then doing a reindex table tags; in psql.

And it looks like now I’ll need to do it for users somewhat like @bartv described. Though with my users, I see one user David and another user david and also [Mm]ark.

Those I fixed like

marks=User.where("username similar to  '[Mm]+ark'").pluck(:id,:username,:created_at)

Then contrive to find the new mark in /admin/users and rename the user there. Then try to reindex table users; in psql. (sudo su - discourse and then psql).

4 Likes

Think of it like this. If you need to look up Pfaffman in a phone book, you go to the P, then try to find entries between Pe and Pg, then between Pfa and Pfb etcetera. You use the fact that the entries are sorted alphabetically to quickly look up the entry. Now imagine - as a mind experiment - that one of the phonebook editors didn’t memorize the alphabet correctly and inserted some Pfaffmans at the wrong place (a corrupt index). You won’t find them.

Now you need to look up all people with a last name ending with faffman. There is no fast way to do that, you will need to go through the entire phone book. That’s a lot of work! But… you will find the incorrectly inserted Pfaffmans this way!

6 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.