I’m just speculating but here is my tinfoil theory:
I ran Reindex concurrently twice (so first ccnew, ccnew1)
and both the times it presented errors.
maybe when it fails, it just doesn’t revert, it just leaves the garbage and quits.
I’m just speculating but here is my tinfoil theory:
I ran Reindex concurrently twice (so first ccnew, ccnew1)
and both the times it presented errors.
maybe when it fails, it just doesn’t revert, it just leaves the garbage and quits.
This sounds very very very likely.
Just adding my $0.02 here,
Toast indexes can’t be dropped this way,
I had to:
su postgres
psql
\connect discourse
drop index pg_toast.pg_toast_309322_index_ccnew;
drop index pg_toast.pg_toast_309322_index_ccnew1;
The above just applies to pg toast because discourse user does not have access to that index.
PG::InsufficientPrivilege: ERROR: permission denied for schema pg_toast
dropping toast indexes is super scary. I guess you have no choice here though. Yeah you need to do that from psql.
and here comes the happy reindex:
discourse=# REINDEX SCHEMA CONCURRENTLY public;
REINDEX
Great catch! I found all indices with “ccnew” in the name via this query.
psql
\connect discourse
select tablename,indexname,indexdef from pg_indexes where indexname LIKE '%ccnew%';
incoming_referer.csv (7.2 KB)
Turned out I had a whopping 30 of them, all on the incoming_referers table. So I verified all the ccnew indices were actually duplicates via the indexdef column in this query.
select indexname,indexdef from pg_indexes where tablename = 'incoming_referers';
ccnew.csv (6.6 KB)
And then dropped all of them successfully.
DROP INDEX incoming_referers_pkey_ccnew;
DROP INDEX incoming_referers_pkey_ccnew_ccnew;
DROP INDEX incoming_referers_pkey_ccnew1;
...and so on for all 30
At that point I reindexed the whole schema again and it again was unable to rebuild two of the same incoming_referers ccnew indices, and also found three pg_toast indices. I dropped them and then reindexed the whole schema yet again, again more errors, found a bunch more ccnew indices in the discourse schema, reindexed a third time…
I can’t get a full reindex to complete without errors, it keeps creating and then failing to rebuild new ccnew indexes every time. After 4 full rebuilds I deleted the ccnew indexes then gave up on it. I guess I could try to rebuild non-concurrently but that would cause a bunch of downtime.
Anyway, my guess is most users upgrading from PG10 to 12 who tried to reindex afterwards have these extra ccnew indexes and they should all be deleted. They’ll just take up space and multiply disk write I/O for no benefit.
I ran it’s equivalent in Data-Explorer. was slightly easier to manage.
What I did was that I took all the ccnew/ccnew1/ccnew2…ccnewn indexes, nuked them and reindexed. That fixed it for me.
ccnew is a postgres thing of marking the indexes and I think it is some inefficiency in the process that causes these to be left if an indexing fails for some reason.
I can only suggest you first work upon the root cause of the problem. I had duplicated tags that I had to delete before proceeding to nuking indexes. if I was left with even a single duplicated tag, it will not index properly and create another ccnew'n'
index and fail.
You’re right, I still have duplicates-- can’t rebuild the non-ccnew indexes non-concurrently. I need to remove these duplicate rows.
discourse=# reindex index index_incoming_referers_on_path_and_incoming_domain_id;
ERROR: could not create unique index "index_incoming_referers_on_path_and_incoming_domain_id"
DETAIL: Key (path, incoming_domain_id)=(/search/, 3433) is duplicated.
discourse=# reindex index index_incoming_referers_on_path_and_incoming_domain_id;
ERROR: could not create unique index "index_incoming_referers_on_path_and_incoming_domain_id"
DETAIL: Key (path, incoming_domain_id)=(/search/, 1861) is duplicated.
What’s really weird is I only see one row with each of those incoming_domain_id values in incoming_referer. So why are they duplicates?
discourse=# select * from incoming_referers where path='/search/' AND incoming_domain_id IN (1861,3433);
id | path | incoming_domain_id
-------+----------+--------------------
42845 | /search/ | 1861
40763 | /search/ | 3433
(2 rows)
@sam or @riking should I delete these two rows as follows:
DELETE FROM incoming_referers WHERE path='/search/' AND incoming_domain_id IN (1861,3433);
… I guess I’m learning postgres after all, heh.
No, that will delete both of them - you want to use the WHERE
to find the two id
s, and pick just one of the id
s to delete.
Your query is using the corrupt index, which is why you’re only seeing one row for each. Try this:
... where path LIKE '%/search/' ...
OK, that brings up 43 rows.
discourse=# select * from incoming_referers where path LIKE '%/search/' ORDER BY incoming_domain_id;
id | path | incoming_domain_id
-------+-------------+--------------------
878 | /search/ | 63
33457 | /do/search/ | 567
1580 | /search/ | 602
1888 | /search/ | 663
42983 | /search/ | 1259
4896 | /search/ | 1788
42845 | /search/ | 1861
5162 | /search/ | 1861
5176 | /search/ | 1866
43350 | /search/ | 1905
17238 | /search/ | 1905
20689 | /search/ | 1982
5781 | /hg/search/ | 1987
8031 | /search/ | 2665
10325 | /search/ | 3192
11289 | /search/ | 3414
40763 | /search/ | 3433
42849 | /search/ | 3433
13087 | /search/ | 3895
13159 | /search/ | 3949
13802 | /do/search/ | 4051
14407 | /search/ | 4209
14507 | /search/ | 4211
15394 | /search/ | 4230
15533 | /search/ | 4258
45274 | /search/ | 5303
20923 | /search/ | 5400
21317 | /search/ | 5534
22928 | /search/ | 5918
22956 | /search/ | 5926
37448 | /search/ | 6393
25094 | /search/ | 6412
25594 | /search/ | 6547
39655 | /search/ | 6596
27371 | /search/ | 6986
27452 | /a/search/ | 7003
27623 | /search/ | 7041
31041 | /search/ | 7767
36943 | /search/ | 8622
37381 | /search/ | 8711
37411 | /search/ | 8716
40424 | /search/ | 9124
44451 | /search/ | 9525
(43 rows)
So should I run the following?
DELETE FROM incoming_referers WHERE path LIKE '%/search/' AND id IN (42845,43350,42849);
Yep, that DELETE looks correct and should allow you to take a clean backup.
Believe it or not, more failures. First set, I found another duplicate key on /m/search which wasn’t caught by the %/search/ wildcard for some reason and deleted that. Reindexed again (each time taking nearly an hour!) and more failures showing a duplicate key on users(username_lower).
discourse=# reindex index index_users_on_username_lower;
ERROR: could not create unique index "index_users_on_username_lower"
DETAIL: Key (username_lower)=(john_smith) is duplicated.
But the thing is, there was only one row with username_lower=john_smith! Time for the detective hat.
Looking at the forum admin UI, we had two separate users respectively named “john_smith” and “John_Smith”-- note the different capitalization. So I nuked the all-lowercase one at the forum level as he hadn’t been active in 4 years and then that index rebuilt OK.
Ran yet another full rebuild, each one taking nearly an hour, and FINALLY I think we’re good-- only one error, but no duplicates, just a pg_toast ccnew. I dropped it.
discourse=# REINDEX SCHEMA CONCURRENTLY public;
WARNING: cannot reindex invalid index "pg_toast.pg_toast_2783329_index_ccnew" concurrently, skipping
REINDEX
It’s been a journey.
Can/will these database fixes be automated in a future discousre upgrade?
More likely pg index corruption will stop in a future pg upgrade, if Peter gets a consistent repro
If a concurrent REINDEX fails because the table has invalid data, you need to:
Fix the invalid data, like it was done here.
List the invalid indexes using SELECT pg_class.relname FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid;
Drop each invalid index listed above using DROP INDEX <indexname>;
Try the REINDEX
again.
After running the upgrade to 2.5.0beta5 and following the post-update guidance to reindex the database, I get this:
discourse=# REINDEX SCHEMA CONCURRENTLY public;
ERROR: could not create unique index "index_plugin_store_rows_on_plugin_name_and_key_ccnew"
DETAIL: Key (plugin_name, key)=(discourse-data-explorer, q:-10) is duplicated.
I’d rather not experiment with this one… so, how do I safely delete the duplicate?
No. The upgrade doesn’t corrupt your index, it just points out that your index is corrupt. You can check by trying to restore your backup to another server (even one that’s running pg10). Or try rebuilding your index on your existing install. It’s not clear what’s causing the corrupt indexes, but there is hope that pg12 will be less likely to have it happen.
There are some performance benefits of the upgrade, but holding off on it is not a bad idea
Too late now, so I’m still looking for guidance re. next steps.
Something like
select id, plugin_name, key from plugin_store_rows where plugin_name like '%discourse-data-explorer%'")
should give you the rows. I think it’s safe enough to delete them.
I think ilike
would be needed