Modifying the database inside of a backup to remove a duplicate key tag so that it doesn't fail when restoring

This is a post is a very condensed version of my last 24hrs, although it hasn’t actually worked yet so i’m also hoping someone will post where it went wrong below this.

My Discourse update failed due to a duplicate key, one of my tags is doubled up. To fix the update issue i needed to do a fresh Discourse install and then load my latest backup, but reloading fails as it gets angry about the duplicate key. So i needed to go inside the backup to edit the offending tag to something different.

For some reason the re-zipped backup with the duplicate tag issue fixed is significantly smaller than the backup it came from, and fails when i try to restore it so something went a bit wrong with the rezipping process.

1) Locating Backups: To locate your Discourse backups, you can use the following command:

sudo find / -name "*.tar.gz"

This will search your system for all backup files with the “.tar.gz” extension. By default it should be inside your container at: shared/backups/default

2) Making a Copy: Once you’ve found the backup you want to work with, create a copy of it to ensure you have a backup of the original file. Use the “cp” command:

bash

sudo cp /path/to/original_backup.tar.gz /path/to/copy_backup.tar.gz

3) Extracting the Copy: Extract the content of the copied backup file using the “tar” command:

bash

tar -xzvf /path/to/copy_backup.tar.gz

This will extract the backup files to a temporary directory.

4) Editing the Tags in the Database: Navigate to the extracted backup files and open the relevant database file using a text editor. I encountered an issue with duplicate “socialmedia” tags, which prevented successful restoration. In a big database there’s loads of instances of tags, and likely for the specific tag you’re searching for so i searched for ‘immutable socialmedia’ using Ctrl W in Nano which took me right there.

sudo nano /path/to/extracted_database.sql

I edited one instance of the “socialmedia” tag to “socialmedia2”, then did a quick search to check it only appears only once now. I can fix those tags from the admin section once the restore succeeds.

5) Rezipping: After editing the backup files, create a new backup file with the corrected content. Use the following command to compress the modified files:

tar -czvf /path/to/new_modified_backup.tar.gz /path/to/modified_files_directory

6) Moving to the Correct File: Move the new modified backup file to the appropriate directory where backups are stored. The default location is usually “/shared/backups/default”:

sudo mv /path/to/new_modified_backup.tar.gz /shared/backups/default/

7) Stopping and Starting Services: Before restoring the modified backup, making sure to stop the relevant services to avoid potential conflicts during the restoration process. Use the “./launcher stop app” command to stop the Discourse application:

./launcher stop app

8) Restoring the Backup: To restore from the modified backup, use the “discourse restore” command with the path to the new backup file:

discourse restore /shared/backups/default/new_modified_backup.tar.gz

Or you can do it via /admin on your site as it should now appear in the backups section.

9) Verifying Restoration: After the restoration process completed, I verified that the changes were successful by checking the Discourse application and database to ensure that the duplicate “socialmedia” tags were removed.

10) Starting Services: I restarted the services that were stopped earlier to bring the Discourse application back online. I used the “./launcher start app” command to start the Discourse application:

./launcher start app

11) Deleting Temporary Files and Extra Backups: After successfully restoring the backup, I deleted any temporary files and extra backups that were created during the process to free up disk space. Use the “rm” command to remove the files:

sudo rm -r /path/to/temporary_directory
sudo rm /path/to/copy_backup.tar.gz
3 Likes

Why?

Why could you not fix this ‘online’ by restarting the app, entering the container, entering postgres and then dealing with the data entry immediately?

1 Like

I wasn’t expecting the error so had already put the new version of Discourse onto my server. The duplicate key error was in the backup and not in the clean installed app, but i couldn’t restore the backup because it wanted the error fixed first.

So i had to try and edit the tag inside the backup.

But you saw the error on update?

Next time make your life easier and fix in place.

The app wasn’t running and i couldn’t reload it, so i updated to a fresh Discourse version in an attempt to fix that. Which meant that i didn’t have access to the database anywhere other than the backup.

Certainly this is a niche case i’ve posted here, the better option would be to notice the issue and fix it while i still had access to the apps database directly, but i missed it and couldn’t find any other option.

1 Like

No worries. At least you’ve shown it is possible, learned some new stuff and given others an additional option.

Good work! :clap:

3 Likes

Thanks, although the original file was 128MB and the new one is 29MB, so i think rezipping on the server may have truncated the file due to its length.

This process seems like it should work, but the file i ended up with could not be used to restore my discourse.

1 Like

The path you took was more risky but surely doable? Maybe someone can chime in with some advice on this issue.

You can presumably repeat this again from the backup so …

1 Like

Is this problem resolved? It reads as if it’s a how-to, but it seems like your site is still broken.

Maybe you’re doing something I don’t understand, but typically you can just do a ./launcher start app to start the old container, was there a reaason that you couldn’t do that?

Then you can use Rails or SQL tools to fix up your database on the old contaiiner, and then try again to run the bootstrap/rebuild.

Or maybe you migrated the database beyond what your old container could handle.

I’ve done similar surgery on a backup before when the site that was being restore was a year or more old. I think the db dump was small enough that I was able to edit it in vim.

1 Like

Thanks for replying.

It refused to start as we were a few upgrades behind, so i updated to the latest Discourse by creating a new one and uploading our old backup, but it rejected that backup due to the duplicate keys.

Or maybe you migrated the database beyond what your old container could handle.

Yes, probably this. Its a bit of a blur exactly what i did now, but i updated a few things independently based on troubleshooting advice around here. One of those was to get the most recent PostgreSQL version.

I was able to edit it in vim .

I was able to edit in Nano, and everything looked good, but the rezipped file was far too small so something went wrong somewhere… maybe i was not able to edit it in Nano. It looked to have worked at the time.

I was hoping someone might see an error in it and correct me, so it would become a ‘how to’ guide.

What I would look at next:

  • Redo the whole unzip. Zip it up unchanged. Check the zip size is same as before. If not maybe you are not zipping with the same options?

  • Unzip again, check file size of file you are editing. Edit it, save it, confirm size unchanged significantly.

1 Like

A bit of an update. Someone else in my team was working on this last week but a solution didn’t come so i had another try, this time by editing the DB on my local system.

What i did:

  1. downloaded old backup that i want to restore
  2. unzipped files with 7zip
  3. opened dump.sql with visual studio code
  4. found the duplicated tags directly in the DB.
  5. found what appeared to be the tag listing by searching using ’ ’ around the tag. In my case ‘socialmedia’. The tags look to be the 2nd and 3rd from bottom of the found instances.

  1. edited one to read

132 ‘socialmedia2’:1A socialmedia2 en_GB 3

  1. Rezipped the dump.sql file in 7zip
  • Add to archive
  • Archive format .gzip
  1. Rezipped the main backup file
  • Add to archive
  • Archive format .tar (gzip is not yet available)
  1. You should now see a zipped .tar fixed backup file

  2. Zip the .tar file in 7zip to create a .tar.gz file, to match the format used by Discourse

  • Add to archive
  • Archive format .gzip
  1. Upload to backups and restore via the admin section

At this point i hit an error message:

Extracting dump file…
[2023-08-08 15:09:15] EXCEPTION: No such file or directory @ rb_check_realpath_internal - /var/www/discourse/tmp/restores/default/2023-08-08-150913/dump.sql.gz

Anyone have any ideas what i missed in the above process?

The only thing i can think of is that the path its looking for is using todays date and not the the date of the backup (i’m writing this on 08-08-2023).

This is a follow up of my earlier post here. I’m posting again so it easier to find for anyone else doing this in the future if it works.

What i did to edit the DB on my laptop:

  1. downloaded old backup that i want to restore from the admin section
  2. unzipped files with 7zip
  3. opened dump.sql with visual studio code
  4. found the duplicated tags directly in the DB.
  5. found what appeared to be the tag listing by searching using ’ ’ around the tag. In my case ‘socialmedia’. The tags look to be the 2nd and 3rd from bottom of the found instances.

  1. edited one to read

132 ‘socialmedia2’:1A socialmedia2 en_GB 3

  1. Rezipped the dump.sql file in 7zip
  • Add to archive
  • Archive format .gzip
  1. Rezipped the main backup file
  • Add to archive
  • Archive format .tar (gzip is not yet available)
  1. You should now see a zipped .tar fixed backup file

  2. Zip the .tar file in 7zip to create a .tar.gz file, to match the format used by Discourse

  • Add to archive
  • Archive format .gzip
  1. Upload to backups and restore via the admin section

At this point i hit an error message:

Extracting dump file…
[2023-08-08 15:09:15] EXCEPTION: No such file or directory @ rb_check_realpath_internal - /var/www/discourse/tmp/restores/default/2023-08-08-150913/dump.sql.gz

Anyone have any ideas what i missed in the above process?

The only thing i can think of is that the path its looking for is using todays date and not the the date of the backup (i’m writing this on 08-08-2023).

1 Like

I think perhaps the exact name of a backup file is important: forum name, date and timestamp, version identifier. So, if unpacking, modifying and repacking, I would suggest rebuilding to the same name as the original. But keep the original safe, of course.

1 Like

I’ve combined the posts from the new topic into this one as it seems like keeping this issue grouped together in one place will make it much easier to follow for future travellers. :+1:

1 Like

Thanks @Ed_S, i kept the original name as i’d read elsewhere that its important. My question above is about why the backup restore tool was looking for and not finding: /var/www/discourse/tmp/restores/default/2023-08-08-150913/dump.sql.gz

which is the date i was doing the restore.

1 Like

Ah, sorry. That does seem odd. It may be that the temporary directory is fully expected to be named according to today’s date, but it doesn’t look good that the sql dump file isn’t found.

If you list the tar file contents do you see that filename in there? In my case

root@ubuntu-2gb-nbg1-1:/var/discourse/shared/standalone/backups/default# tar vtfz forumname-2023-08-03-HHMMSS-v2023mmddhhmmss.tar.gz dump.sql.gz | head
-rw-r--r-- discourse/www-data 16336925 2023-08-03 05:31 dump.sql.gz
1 Like

Thanks Ed, That file didn’t exist. Sorry for the delay, i’ve been off-grid for a bit.

There isn’t a file with the correct name there so I just tried creating an empty file there manually:

sudo mkdir -p /var/www/discourse/tmp/restores/default/2023-08-22-121010/

but every time i hit restore it looks for a slightly different file (the last 6 digits). I assume it is looking for a folder generated by a timestamp, so each time i hit the restore button the folder it’s looking for has changed.

I suspect something is going wrong in your step 10 where you create the tar file. Can you see it? Can you use file to describe it? Can you list the contents with tar tvfz?

1 Like