How to recover my broken (not web-accessible) Discourse installation after failed upgrade

I just attempted to upgrade my Discourse installation to the latest beta (2.4.0 beta 11) via the web update and now it is inaccessible (try for yourself, returns HTTP 500 errors).

Context

My installed version was still 2.4.0 beta 6, I was trying to upgrade to 2.4.0 beta 11.
I think I had a fairly vanilla install, don’t remember whether I had any custom plugins (and can’t look it up now) but at most 1-2. I kept most of the default Discourse settings.

What I did

I clicked on the one-click browser update link in the update notification mail I got, which brought me to /admin/upgrade.

According to the web updater, I had to first do some docker manager update, before I could update the actual discourse version, so I picked that. Some stuff ran through and after about half of the progress there was a message above the web log output screen saying that something went wrong during the upgrade and I should check the logs. I scrolled through the logs but nothing jumped out to me directly. I sadly failed to save the log and instead simply reloaded the page, hoping that I could maybe just retry the install and assuming that the failure had caused the upgrade to simply be rolled back.

Interestingly, the admin/upgrade/ page then told me, that the docker_manager was up-to-date and no longer needed to be upgraded, so I (most likely mistakenly) assumed that the upgrade had indeed worked and the error message was a bug. I also checked whether my forum was still running and it was and didn’t seem to have any issues.

I was now presented with the option to upgrade the Discourse install itself (which was previously grayed out), so I did. Again stuff was logged to the log output, which I didn’t pay much attention to and after about half the progress bar was filled, a similar error message was displayed above the log output about something having gone wrong.

I figured, I’d do the same as before, reload the page (again, I sadly did not to save the log). And as expected, Discourse was now listed as “up-to-date”.

This is what I currently see under admin/upgrade/:

But the actual forum is not working anymore and only returns 500 errors.

What still works

  • /admin/upgrade/ still works and displays that Discourse is up-to-date.
  • I can also click on the “Processes” tab and get a list of running processes.

But even the Backups tab returns a 500 error already and so does /admin. I haven’t found any part of the forum that works except for the two tabs under /admin/upgrade/.

How to Recover?

I don’t have a good idea what went wrong and also don’t know where to start. I don’t even know how to access the logs to figure out what the error was without the web interface. The Discourse installation is hosted in Digital Ocean and I can ssh into the machine and probably the container, but I don’t know where to look for the logs.

A pointer where to look for the logs would be greatly appreciated.

For now, my best idea is to go back to a backup and loose whatever was posted after the last backup (luckily there is not a lot of traffic, so loosing a day of content is acceptable).

My current plan and what I’m lacking

I’ve set up Digital Ocean to make weekly backups from the Droplet and I believe my Discourse installation was setup for daily backups. I never configured S3, so those should still be saved locally. The Digital Ocean Droplet backup is 5 days old, however, I’d prefer not to loose the content from the last couple of days.

My rough plan is basically to go back via backups to a known working state for now by doing the following:

  1. Download the Discourse backup form today/yesterday.
  2. Roll back the whole Droplet via Digital Ocean to the last backup from 5 days ago, so I have a working Discourse installation again.
  3. Import the dowloaded backup to get the content back (minus whatever happened after the last Discourse backup)

I can do the rollback via DO (step 2) and will figure out how to import the existing backup when I get there (step 3), but I don’t know how to get to the backup without the web interface and with /admin/backups/ returning HTTP 500 errors.

Where do I have to look to find the backup via SSH/what do I need to be able to restore it after rolling back to the old Droplet backup via DO?

Searching through the forums I’ve only found topics about hosting the backups on S3, but not where they are when they are stored locally.

1 Like

With help form this topic I found that backups are stored in
/var/discourse/shared/standalone/backups/default in my case.

My current assumption is that I can simply copy out one of the .tar.gz files in there, roll back the Droplet, and put the backup file back at the same place so Discourse will find it and allow me to restore to it via the web interface.

cd /var/discourse
./launcher rebuild app
7 Likes

Thanks a lot, that worked! Much easier than going through all the backups-restore steps! :slight_smile:

3 Likes