Restoring a corrupt file does not roll back the database

I found another backup/restore issue.

When I try to restore a corrupt database dump, the restore (obviously) fails but it leaves the system in an unusable state, because there is no rollback attempted and all tables are left in the backup schema.

(EDIT this doesn’t only happen with corrupted files, it also happens when one tries to restore a dump that was made with a pg_dump that is newer than the current psql)

Repro:

First let’s create a corrupt dump file for test purposes:

discourse@test16:/var/www/discourse$ echo "-- Dumped by pg_dump version 9.5.12"  >> dump.sql
discourse@test16:/var/www/discourse$ echo "broken" >> dump.sql 
discourse@test16:/var/www/discourse$ gzip dump.sql 
discourse@test16:/var/www/discourse$ mv dump.sql.gz public/backups/db5634/discourse-2018-04-05-174724-v20180308071922.sql.gz

Then restore:

discourse@test16:/var/www/discourse$ RAILS_DB=db5634 RAILS_ENV=production script/discourse restore discourse-2018-04-05-174724-v20180308071922.sql.gz
Starting restore: discourse-2018-04-05-174724-v20180308071922.sql.gz
[STARTED]
'system' has started the restore!
Marking restore as running...
Making sure /var/www/discourse/tmp/restores/db5634/2018-04-05-175722 exists...
Copying archive to tmp directory...
tar: /var/www/discourse/tmp/restores/db5634/2018-04-05-175722/discourse-2018-04-05-174724-v20180308071922.sql: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
No metadata file to extract.
Validating metadata...
  Current version: 20180308071922
  Restored version: 20180308071922
Cannot restore into different schema, restoring in-place
Enabling readonly mode...
Pausing sidekiq...
Waiting for sidekiq to finish running jobs...
Restoring dump file... (can be quite long)
ERROR:  syntax error at or near "broken"
LINE 1: broken
^
EXCEPTION: psql failed
/var/www/discourse/lib/backup_restore/restorer.rb:326:in `restore_dump'
/var/www/discourse/lib/backup_restore/restorer.rb:67:in `run'
script/discourse:111:in `restore'
/home/discourse/.rvm/gems/ruby-2.3.1/gems/thor-0.19.4/lib/thor/command.rb:27:in `run'
/home/discourse/.rvm/gems/ruby-2.3.1/gems/thor-0.19.4/lib/thor/invocation.rb:126:in `invoke_command'
/home/discourse/.rvm/gems/ruby-2.3.1/gems/thor-0.19.4/lib/thor.rb:369:in `dispatch'
/home/discourse/.rvm/gems/ruby-2.3.1/gems/thor-0.19.4/lib/thor/base.rb:444:in `start'
script/discourse:273:in `<main>'
Trying to rollback...
There was no need to rollback
[FAILED]
Restore done.

Note the line near the end stating ‘there was no need to rollback’ . Well, there is, since all the tables have been moved to the backup scheme.

The culprit is here

https://github.com/discourse/discourse/blob/master/lib/backup_restore/restorer.rb#L439-L447

The problem is that @db_was_changed is never set, because it is set by switch_schema! and that is never called when can_restore_into_different_schema? is false.

I think the fix is pretty simple : @db_was_changed = true should be inserted below this line ?

EDIT: yes that fixes it.

9 Likes

Sure maybe @tgxworld can take a look?

@RGJ Thanks for investigating! I’ve applied the fix and backported it to stable.

8 Likes

This topic was automatically closed after 32 hours. New replies are no longer allowed.