Database restore broken


#1

I am currently trying to move a Discourse Instance from one server to another, but the import of the backup fails. Both instances are Docker installs of the same Version (20180227161818).

The restore log starts with

[2018-03-07 10:13:52] 'florian.admin' has started the restore!
[2018-03-07 10:13:52] Marking restore as running...
[2018-03-07 10:13:52] Making sure /var/www/discourse/tmp/restores/default/2018-03-07-101352 exists...
[2018-03-07 10:13:52] Copying archive to tmp directory...
[2018-03-07 10:13:52] Unzipping archive, this may take a while...
[2018-03-07 10:13:52] Extracting metadata file...
[2018-03-07 10:13:52] Validating metadata...
[2018-03-07 10:13:52]   Current version: 20180227161818
[2018-03-07 10:13:52]   Restored version: 20180227161818
[2018-03-07 10:13:52] Extracting dump file...
[2018-03-07 10:13:52] Restoring dump file... (can be quite long)
[2018-03-07 10:13:52] SET
[2018-03-07 10:13:52] SET
[2018-03-07 10:13:52] SET
[2018-03-07 10:13:52] SET
[2018-03-07 10:13:52] set_config
[2018-03-07 10:13:52] ------------
[2018-03-07 10:13:52] (1 row)
[2018-03-07 10:13:52] SET
[2018-03-07 10:13:52] SET
[2018-03-07 10:13:52] SET
[2018-03-07 10:13:52] ERROR:  function "raise_read_only" already exists with same argument types
[2018-03-07 10:13:52] ERROR:  current transaction is aborted, commands ignored until end of transaction block

From there on, everything goes south. Tons of invalid command \N lines and other errors that look like psql is trying to interpret post contents as commands (ERROR: syntax error at or near "aber" / LINE 1: aber wenn das Tee ist, bringe mir bitte Kaffee. Eine Vorteil...).

According to ./launcher enter app / psql --version, both instances run the same psql version.

I guess the offending line is

[2018-03-07 10:13:52] ERROR:  function "raise_read_only" already exists with same argument types

and everything else are follow-up errors.

I get the same result if I try to restore via the web interface or with ./launcher enter app / discourse restore

Any hints?


Move your Discourse Instance to a Different Server
(Jay Pfaffman) #2

Strange. I’d do an update on the new server (even though you say they’re the same version).


(Matt Palmer) #3

There are known incompatibilities with changes to the way pg_dump works in the latest version of PostgreSQL. We’re working on it.


#4

@pfaffman The new instance was just git-pulled and launcher-rebuilt minutes before the attempted restore, so it should be up to date.

The admin interface of the old server also said “You are running the latest version of Discourse”.
However, when manually browsing to /admin/upgrade, I got a message “You are running an old version of the Discourse image. Upgrades via the web UI are disabled until you run the latest image.”

So I git pull && ./launcher rebuild app'ed on the old server and created a new backup (which interestingly was 2 MB larger than the previous, even though no new posts were written in the meantime).

Restoring that new backup on the new server (via console) now worked like a charm!


#6

Making sure both servers are same version and preferably the current version is surely good practice in any case? Glad you got it sorted! :slight_smile:


#7

Yes, but making sure to only display “You’re up to date” in the backend if that is true should also be good practice!


#8

Yeah, I’ve noticed that discrepancy, assumed it was to allow for a few versions difference without demanding an upgrade … was that not intentional to avoid nagging on every minor commit?


(Jay Pfaffman) #9

Hey, @mpalmer, does this mean you’ve fixed it if both the backup and restore instances are updated?

Edit: Or is there a version that I can go back to that will restore the older backup?


(Richard - DiscourseHosting.com) #10

No and no, unfortunately.

Instances backed up with pg_dump version 10.3 cannot be restored from within Discourse right now.

This has to do with some changes in pg_dump that are incompatible with the way Discourse does its restore.

The only way to restore a backup that was made from a Discourse instance with pg_dump 10.3 at this moment (!) is to create an empty database and zcat dumpfile.sql.gz | psql -d databasename from a command prompt.


(Jay Pfaffman) #11

Ouch. That’s a nasty bug.

Thanks, @RGJ!

I can’t tell anyone else how to do the restore (something like: enter the container, kill some stuff, run psql as postgres, drop database, create database, exit container, rebuild?), but this was enough for me to manage to do it.

Hopefully this’ll get fixed soon.

./launcher enter app
sv stop unicorn
sv stop redis
su postgres -c 'psql'
drop database discourse;
create database discourse;
\q
cd public/backups/default/
BACKUP=`ls |head`
zcat $BACKUP | su discourse  -c 'psql discourse'
sv start unicorn
sv start redis
rake db:migrate

Used to work. . .


(Alan Tan) #12

Can I confirm if this is a self hosted install and the version of Postgres you’re running on?


(Jay Pfaffman) #13

Yes, it’s self-hosted. The site I’m restoring to is a fairly vanilla install (except that I moved a couple volumes around).

Inside the container:

psql (PostgreSQL) 9.5.10

BUT

I’m running an import in a discourse_dev container, and it has

psql (PostgreSQL) 9.5.12

I just the docker_dev container today, when I couldn’t restore to the production instance. I don’t know what version it was before I updated discourse_dev.

I’m not sure why I did a rebuild of the production container. This was all working fine yesterday.

EDIT: But neither of these is the 10.x version of Postgres that Richard was talking about.


(Richard - DiscourseHosting.com) #14

It’s about the version of pg_dump, on the site where the export is being made.

If you have an blabla-20180306123456.sql.gz you can take a look inside it and one of the first lines will contain the version as well. If you have a .tar.gz file there will be a dump.sql.gz file inside and you can look at the first lines of that one.


(Jay Pfaffman) #15

Pg-dump is the same version, 9.5.12, so it doesn’t seem like the same problem that you’re referring to.


(Richard - DiscourseHosting.com) #16

Are you getting the exact same error as described in the first post?

ERROR:  function "raise_read_only" already exists with same argument types

(Jay Pfaffman) #17

I’m guessing not. Enough other stuff seemed the same that I thought it was. I won’t know until I try again tomorrow.

I’m using only containers supplied by discourse.org, so it doesn’t seem like it should be the error that you’re referring to.


(Richard - DiscourseHosting.com) #18

I’m not so sure whether that statement holds.


(Alan Tan) #19

I managed to get a repro locally and nailed it down to the TableMigrationHelper not cleaning up the function it created. Having a function that is used once lying around in the database is problematic during restore because functions are not transferred between schemas and the dump file would always try to create it.



(Jay Pfaffman) #20

That’s awesome, @tgxworld! Many thanks! This is great relief.


(Jay Pfaffman) #21

Well, I may have spoken too soon. I’m pretty sure that I pulled in the new code on the development instance before making the database, though I still lose fights with git more often than I care to admin.

I’m doing a database-only restore and it starts like this:

[STARTED]
'system' has started the restore!
Marking restore as running...
Making sure /var/www/discourse/tmp/restores/default/2018-03-08-175839 exists...
Copying archive to tmp directory...
Extracting metadata file...
Validating metadata...
  Current version: 20180308071922
  Restored version: 20180308071922
Restoring dump file... (can be quite long)
SET
SET
SET
SET
set_config
------------
(1 row)
SET
SET
SET
SET
SET
ERROR:  relation "api_keys" already exists
n block
n block
n block
n block

Here’s the whole log: https://gist.github.com/pfaffman/86223c0ae0a4d55d23d24d8cf0d0df07