Lost all production data after upgrading (using Docker)


(Gustavo Scanferla) #1


We’ve lost all Discourse data after upgrading. We’ve changed the app.yml version to tests-passed and then ./launcher rebuild app.
(Although I would have preferred the admin panel method at /admin/upgrade)

At first, it didn’t build: After Update, bootstrap failed due to missing Celluloid (right now it’s fixed and I’m using it).

Then I tried with the stable version and it did build successfully. But, to my surprise, it lost all the data.
It’s a production Discourse installation, with thousands of daily visitors.

Going to /var/discourse/shared/standalone/postgres_data and doing a ls -la shows that all files and directories were created today.

We’ve upgraded Discourse before and it didn’t loose any data. Now we have no data at all.

What do I do now?

Thanks in advance!

(Sander Datema) #2

Although I don’t know what caused this, I hope you do have backups?

(Stephanie Daugherty) #3

Check http://yoursite.example.com/admin/backups or /var/discourse/standalone/backups for any automatically created backups., hopefully they were enabled.

(Gustavo Scanferla) #4

I didn’t have access to the production admin panel and servers before this current event.

@sdaugherty Although they were pretty sure they’ve configured a daily backup, they’re not on /var/discourse/standalone/backups or at the /admin/backups URL path.

@Sander78 We have a 2 months old backup that we can use, and then we’ll make sure the backup system is working properly.

But I’d like to know if we can “find” the lost data somewhere. Maybe on another folder or Postgres instance? It must be somewhere since we didn’t do anything or input any command to erase the data :frowning:

(Mittineague) #5

Are you certain that changing branches isn’t “anything”? :wink:

If you changed them before they had time to complete, maybe the data got corrupted and then auto-discarded?

(Gustavo Scanferla) #6

Hi @Mittineague, thanks for your input :smile:

I’ve added a new user to the current installation, changed branch back to stable and the user/data was not lost.

Based on that experiment, I don’t think that changing branches on app.yml and doing a ./launcher rebuild app erases Discourse data.

And what do you mean by “before they had time to complete”?

PS: Maybe Discourse thought it was a new installation and then erased/seeded the Database?

(Gustavo Scanferla) #7

Is there a way to access Postgres from the command line? Maybe the database is still there, somehow.

(Mittineague) #8

AFAIK different branches run different Rake tasks (esp. database) and have enough differences that the same database can’t be used without being modified - if possible at all.

Ah, that’s great to know.

Maybe some kind of “changing branches” guide is in order?
I imagine switching branches is not that common, but just the same.

(Joe Seyfried) #9

Yes, if you SSH into your Docker container, you can do a su postgres and then use the psql command line client, if you feel brave. Good hunting!

(Sam Saffron) #10

data is most likely on the box you just have your container pointed to wrong spot

What dirs are in /var/discourse/shared ?
What dirs are in /var/docker/shared ?

(Gustavo Scanferla) #12

Hey @JSey, thanks for your help!

Sadly I can see just one discourse database and it’s the new one. Makes sense, since it’s a new container and /var/discourse/shared/standalone/postgres_data (where the permanent data was supposed to be) got somehow erased and then replaced with new, empty, data.

(Gustavo Scanferla) #13

Hi @sam,
/var/discourse/shared has standalone
/var/discourse/shared/standalone/backups was empty. But, since yesterday, we have some new backups there.

And No such file or directory for /var/docker/shared :frowning:

How can I find and point my container to the correct spot?

Thanks for your help!

(Wes Osborn) #14

Just throwing this out there as a possibility. Once when we did an upgrade, we updated the underlying OS (Ubuntu) at the same time and lost support for AUFS. Though I don’t totally understand the relationship (being a Docker noob), AUFS was being used to provide access for our Discourse docker container to the underlying storage. So basically Discourse acted like a new install when the OS didn’t have support for AUFS because it couldn’t reconnect with the storage it had been using via AUFS. When we added support back into the host OS for AUFS, Discourse went back to normal because it was able to reconnect with the original storage source.

If you think this might have happened in your case, you can search for AUFS in meta for more troubleshooting resources, here is a sample:

(Gustavo Scanferla) #15

Hi @wesochuck, I talked to my teammate that did the upgrade and he didn’t update the underlying OS.

But thanks for your input, anyway. I’ll keep as a reference if something happens in the future (I hope not!).