Error Restoring Backup on Migration

Please, may you provide guidance about which file we have to edit in the backup tar?

There is dump.sql packed inside the archives. You need to modify it and then repack the modified version back. Iā€™ve solved my other problems too by modifying it - removed some rogue custom fields that were causing crashes after login.

Thank you.

I will try to download the bakup, un pack it and change that file following your instructions.

It is quite scary to have to do all that in order to restore a backup.

I suppose it is a bug of the new release.

But backup and restore are keystones of a disaster recovery plan.
They should be as robust as possible, and a bug in that processes have great impact.

1 Like

Well I was able to do the restore without changing anything in the backup file.

I just tried several times and oddly enough, one of the times restore with no error.

I was kicked out from discourse and it did not work until I made a launcher rebuild app.

But now It is working correctly.

A strange issue.

This is still giving me trouble restoring my forum from backup. It has been several weeks and the restore from backup functionality appears to still be broken.

Any fix from this?

As far as I can tell, alternate between updating, checking formatting for the tables, making sure everything is similar between source and host, and watching it fail multiple times, and that might or might not work without some minor database edits.

I have successfully migrated 2 of 3 sites, and am forced to use less than one hour a day on it for sanity. I have begun talking to the clients about the issues this could cause in the future with any similar situation. shrug

I simply insist in restoring and I could get it working.

The error complains about a column that does no exist in the user profile table.

But it has to be a timeout error or something like that in the database side, may be a bug in the postgres side. If the column is not there it is not created on its own when you insist in restoring.

Jaromir says that Changing the script solves the issue.

Nobody from discourse developers here seems to have worried about this issue, but it is a strange error and very disturbing one, as it affects your disaster recovery plan.

May be the topic has gone unnoticed among the others.

It hasnā€™t gone unnoticed. It will be the first thing Iā€™ll be looking into tomorrow.

And Iā€™m starting to work on improving backups and restores, because nobody should need to worry about those things in case of a disaster or when you simply want to migrate to a new server.

11 Likes

Great. Thank you.
Glad to hear that.

Thanks, Gerhard. I donā€™t know if you care now, but Iā€™m also having trouble with a site thatā€™s using PG 11 with GCP. It might be worth checking on that as it might affect the future move to PG12 that I understand should happen later this fall.

I just upgraded two instances that share an S3 backup bucket. I ran a backup on one and tried to restore on the other and get

No migration with version number 20191007140446.

PostgreSQL 11 and 12 are currently not supported.

4 Likes

Okay, I installed the latest version of Discourse (tests-passed) on a droplet and restoring of backups (uploads included, not using S3 for uploads) worked without problems.

If you are still encountering problems during a restore, please do the following:

  • Rebuild the container:

    cd /var/discourse
    git pull
    ./launcher rebuild app
    
  • Restore the backup either via web interface or command line:

    cd /var/discourse
    ./launcher enter app
    discourse enable_restore
    discourse restore <filename>
    

If it doesnā€™t work, please post the version number of the backup file you are trying to restore as well as the error message you see during the restore.

6 Likes

Both sites are 2.4.0.beta6 (8fc0cc9aaa). The backups (but not uploads) are on S3.

discourse restore returns

Starting restore: wonderful-community-2019-10-10-184822-v20191007140446.tar.gz
[STARTED]                                                                              
'system' has started the restore!                               
Marking restore as running...                                                                  
Making sure /var/www/discourse/tmp/restores/default/2019-10-10-211121 exists...             
Downloading archive to tmp directory...                                               
Unzipping archive, this may take a while...
EXCEPTION: Compression::Strategy::ExtractFailed
/var/www/discourse/lib/compression/gzip.rb:49:in `block in extract_file'
/var/www/discourse/lib/compression/gzip.rb:45:in `open'
/var/www/discourse/lib/compression/gzip.rb:45:in `extract_file'

Of course, and I think that site will be satisfied with direct database backups on GCP anyway, but at some point Sam said that he was running PG 11 on his dev site and that heā€™d be interested to know of problems with PG11.

@pfaffman Please increase the decompressed_file_max_size_mb site setting (itā€™s hidden). The default is currently set at 1GB.

I have a PR ready to bump the default to 100GB but it wasnā€™t merged yet:

https://github.com/discourse/discourse/pull/8179

7 Likes

Thanks, @Roman_Rizzi. Well, that solved that problem.

But now Iā€™ve got a bunch of invalid command \N s (and they filled the buffer before I could get what came before them), but maybe

ERROR:  syntax error at or near "Shiny"        
LINE 1: Shiny contest submission 2019-01-07 20:00:05.570573 2019-01-...
^       
EXCEPTION: psql failed
/var/www/discourse/lib/backup_restore/restorer.rb:324:in `restore_dump'
/var/www/discourse/lib/backup_restore/restorer.rb:75:in `run'

is what you need to know.

Yes, I believe thatā€™s caused by PG11.

2 Likes

If it were the pg11 instance Iā€™d agree! But this is a standard 2 container install.

Wait! There is a version mismatch.

root@community:/var/discourse# ./launcher enter data                                      root@staging-data:/# psql --version
psql (PostgreSQL) 10.7 (Ubuntu 10.7-1.pgdg16.04+1) 

The one Iā€™m restoring on is 10.9! I bet thatā€™s it. (I think the pg11 fails similarly but there Iā€™m trying to restore on the same instance).

Iā€™ll upgrade the data containers tomorrow and let you know. Thanks for your help.

3 Likes

Well, I upgraded both to 10.10 (using the standard data templates) but still got the invalid command stuff.

When the invalid command errors started I force-quit the restore script. Further attempts to restore (to get the first error before the invalid command messages resulted in:

ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR:  relation "theme_fields" does not exist

I then did a rake db:migrate on both instances, backed up again and the restore succeeded . Maybe a migration got missed somewhere along the way?

(after changing the setting mentioned aboveā€“here are complete instructions for those who might need them in the tiny amount of time before itā€™s unnecessary)

./launcher enter app
rails c
SiteSetting.decompressed_file_max_size_mb=1000000
1 Like

I just had another one fail. This one both are 2.4.0.beta6 (One is 2c011252f1, the other may be a bit more recent).

Iā€™m restoring via S3. Iā€™ve tried both with and without uploads. Both seemed to be working and then failed like this:

...
COPY 11871
COPY 3689
COPY 0
COPY 36550
COPY 0 
COPY 14736
/usr/local/bin/discourse: line 2:  3232 Killed                  RAILS_ENV=production sudo -H -E -u discourse bundle exec script/discourse "$@"

Is this the only message youā€™re getting?

What if you try to remove any s3 dependency and copy the backup file to local first?

@pfaffman it might be good to know that the two (or three) restore issues you have posted in this topic are not occurences of the bug that this topic was originally about (the PG::UndefinedColumn: ERROR issue). You might consider opening new topics for these since they are clearly different issues.

4 Likes