Modifying the database inside of a backup to remove a duplicate key tag so that it doesn't fail when restoring

andy@ubuntu-s-1vcpu-1gb-ams3-01:/var/discourse/shared/standalone/backups/default$ file public-happiness-2023-07-25-033857-v20220628031850.tar.gz
public-happiness-2023-07-25-033857-v20220628031850.tar.gz: gzip compressed data, was "public-happiness-2023-07-25-033857-v20220628031850.tar", last modified: Tue Aug  8 14:53:40 2023, max speed, from FAT filesystem (MS-DOS, OS/2, NT)

The full file contents looks fine, though its difficult to say as it’s a lot of information: Hastebin

I think it has to do with the file paths. Have a look at

Hmm. That might not be the issue. I mostly don’t trust 7zip to create a compatible tar file, but that may be irrational.

  -h, --dereference
              Follow symlinks; archive and dump the files they point to.

The answer might be in the file above. Actually, it’s probably in another file in the same directory.

1 Like

Thanks - right contents, but under the wrong names, I think. Hence the problem.
You have

-rwxrwxrwx 0/0        26927534 2023-08-08 14:37 public-happiness-2023-07-25-033857-v20220628031850/dump.sql.gz

but the original and any unmodified backup would have

-rwxrwxrwx 0/0        26927534 2023-08-08 14:37 dump.sql.gz

Edit: so you need to drive 7zip just a little differently in building that tar.gz file.

1 Like

Thanks for all your help. I’ve unzipped the files, edited the duplicate tag again and then very carefully rezipped paying extra careful attention to the file name, and there is progress!

Now when restoring i see this error message, which appears to be far more common:

[2023-08-25 15:25:21] CREATE INDEX
[2023-08-25 15:25:21] ERROR:  could not create unique index "index_tags_on_lower_name"
[2023-08-25 15:25:21] DETAIL:  Key (lower(name::text))=(socialmedia) is duplicated.
[2023-08-25 15:25:21] EXCEPTION: psql failed: DETAIL:  Key (lower(name::text))=(socialmedia) is duplicated.

I’d guess that means that i have changed the tag successfully, but there are still some occurrences of the tag in posts in my database. The tag_id number says there should be a tag called socialmedia but instead its finding a tag called socialmedia2 which is causing a conflict.

this post and this one discuss fixes, but as i only have access to my backup via directly editing the code on my local machine i’m not able to use the mysql tools to help clean it up.

Fortunately in my database i’ve only got 38 instances of 'socialmedia' (as oppose to 50,000+ socialmedia occurrences). Assuming I was correct changing the one on line 395421 as i screenshotted above, then I can’t see how to tell which remaining ones are linked to the ‘socialmedia’ tag, and which to the tag i’ve edited to ‘socialmedia2’.

Here’s an example of a fairly short post using the socialmedia tag

9488	'/groups/communitybuilders':86 '/groups/socialmedia':84 '/groups/webdev':89 '1st':117 '2022':131 '6':125 'activ':61 'banner':113 'btw':143 'close':169 'comment':21 'communiti':47 'communitybuild':87 'concept':4A 'especi':28 'event':119 'excit':164 'feedback':8B 'final':166 'get':38,133 'github':94 'grow':6A,142 'hack':127 'hard':156 'help':96 'homepag':151 'host':124 'improv':11B 'join':71,106 'launch':41,118,126 'like':128 'link':110 'live':140,175 'lot':27 'love':1A,67 'marvelxi':152 'mean':25 'media':51 'member':62 'mention':93 'move':45 'much':15 'new':150 'one':72,107 'onto':53 'plan':121 'platform':7B,43,139 'pleas':5A 'project':137 'promot':97 're':33,36,56,161 'readi':39,172 'rhorho358':23 'right':63 'see':100,167 'site':176 'slight':76,177,179 'small':58 'smile':77,178,180 'social':50 'socialmedia':85 'stage':31 'suggest':10B 'sure':79 'take':17 'team':59,75,103 'thank':12 'think':147 'time':19 'use':108 'webdev':90 'websit':3A 'whether':80 'work':155 'would':66,82	Thank you so much for taking the time to comment here @R , it means a lot, especially in the st... has been working hard on it and we’re all very excited to finally see it close to being ready on the live site :slight_smile: :slight_smile:	en_GB	4	f

I might be on the wrong track here though as that does look like more tags at the start than a user is likely to use in a post. Also possible that ‘socialmedia’ isn’t a tag used in the above post, although it should have been.

I would restore the database by hand and try to add the indexes and fix up the issues Wyeth the database rather than a text file, but that’s hard too.

  1. I don’t think that’s the immediate conclusion you can/should be drawing. The issue should be straightforward:

    The reason that index won’t build is you have at least two entries in tags table which resolve to the same thing when you take the lower case of their name. That’s what the error message is telling you.

    So I think you still need to find the related entries in that single table that clash when going through that transform.

  1. Also, Posts aren’t tagged, Topics are.

Before you delete the duplicate(s) note their ids because you will need to delete the related rows from the topic_tags table too (something you could have taken care of quickly if you’d performed all this maintenance online by simply restarting the container btw instead of destroying the instance!!).

3 Likes

Our site is back! Thanks to everyone for your help.

It seems i had solved it days ago, but was careless with reading the error message. There were two duplicated tags, ‘socialmedia’ and ‘social-media’. After fixing the 1st i didn’t notice that the error message had changed, given how similar both duplicate tags were.

Here’s the processes for fixing those two errors:

1. Finding the tag table and the duplicate tag

  • Download the backup to your operating system. This guide is for windows, but the process will be about the same on Linux

  • Extract all the zipped folders in it which should leave you with a dump.sql file and an uploads folder

  • Open the dump.sql file with a text editor, i used Visual Studio Code

  • Search for “COPY public.tags” to locate the tags table. It should be near the bottom and look like this:

  • Either browse through it manually, or copy paste your tags table into a separate document where you can use the search function on it to find your duplicate tag

  • Save your fixed dump.sql file as dump.sql

2. The order and names of files and folders has to be perfect in the re-zip.

  • After extraction you should have a dump.sql file and an uploads folder

  • Right click on dump.sql. select 7zip and ‘add to archive’.

  • Select gzip as the archive format, keeping the name of the file the same

  • Select the new dump.sql.gz file and the uploads file, then right click > 7zip > add to archive > archive format: tar. Make sure the name of the file is exactly the same as the original backup, it should look something like this: ‘public-happiness-2023-07-25-033857-v20220628031850’

  • Select your new .tar file > 7zip > add to archive > archive format: gzip. Make sure the name of the file is exactly the same as the original backup, it should look something like this: ‘public-happiness-2023-07-25-033857-v20220628031850’

  • Your final result should be a .tar.gz file with the same name as your original backup.

  • Upload to the admin area and restore your backup

3 Likes

There is one more place the tag is/might be repeated and that is the search data table:

COPY public.tag_search_data (tag_id, search_data, raw_data, locale, version) FROM stdin;

I’m unsure if that also needs to be corrected or not.

So glad you finally got it fixed!

2 Likes