Fix Discourse installation on Digital Ocean that broke during Docker update

What would you like done?
My Discourse installation has been working swimmingly for 3 years. I did a manual update to Docker and everything is now broken. I can only access the server in Recovery Mode. I cannot get Docker to start. When I launch the server from the hard drive (instead of Recovery Mode), I am unable to connect via SSH.

When do you need it done?
Get Discourse working again!

I have a snapshot from 2 months ago, though would prefer to NOT lose the data from the last 2 months. I also have a snapshot from immediately after Docker broke.

I hired a dev a few hours ago and he was unable to complete the project as it’s late for him, yet I need this fixed ASAP. This is a production site.

He stated:

all of the standard checks have been made, ssh is working, traffic is not blocked, we updated the ssh config to use the password auth. What we need to do is to investigate what steps were made before this breakage and investigate the related logs

What is your budget, in $ USD that you can offer for this task?
I will pay hourly for the task, at market rate. Just share your rate.

2 Likes

I sent a PM. As I said there, it might be quickest to spin up a new Droplet.

3 Likes

A big THANK YOU to @jericson for your help!

Here’s what we ended up doing:

  1. Gain recovery access to the old site
  2. Attach a network drive (Digital Ocean Volume Block Storage)
  3. Zip (make a tar ball) of the /var/discourse files
  4. Exfiltrate those files to the network drive
  5. Turn off the old server and detach the network drive
  6. Build a new Discourse org on a new server
  7. Connect the network drive
  8. Unzip the files
  9. Find a backup from the 7-day backup
  10. Restore to that point

We had attempted to just move the /var/discourse folder in totality over to the new server but ran into Redis issues (not sure those were the core issues, but that was what was flagged).

Grateful again for your help, Jon. Thank you!

5 Likes

Glad you were able to get your install fixed. Nice work @jericson :clap: :slight_smile: :discourse:

4 Likes

Just curious. There are a couple of recent messages of broken upgrades connected to Docker and DigitalOcean. Is this just a coincidence or is there a common cause that other Discourse admins on DigitalOcean will hit if they upgrade? I’m asking because I’m one of those. :sweat_smile:

@waffleslop @jericson good work and thanks for posting the info on what you did to fix it - always good to have a resource like that as a self-hoster, in case I ever hit issues!

@icaria36 I am on Digital Ocean for most of my instances and have upgraded both OS and Discourse codebase very recently without any issues at all. (Hopefully posting this will not jinx me!)

2 Likes

I can confirm that all upgrades have been easy-peasy until yesterday. I updated Docker from the GUI and it worked. Then I went to update the next 3 items and pressed one of them (I forget which) first. Nothing happened. I waited a few seconds and pressed the other… then the other. Maybe I acted too quickly and jammed things up? I ended up logging into the console and found a message that a restart was suggested, so I rebooted the machine. It never came back online fully after that.

It’s possible I rebooted during an upgrade/update which, as I write this, feels pretty dumb to do!

3 Likes

We didn’t spend any time looking into what caused the problem because I wanted to get @waffleslop up and running as soon as possible. I’ve upgraded my Discourse (hosted on DigitalOcean) servers without a problem. However, I do use the command line rather than the GUI since I have a non-standard install.

I can recommend a few things to minimize the risk of extended downtime:

  1. Make a backup before doing anything! I wonder if there should be a warning in the interface strongly recommending a backup before you can do an update. A recent backup gives me comfort that at least we can spin up a new Droplet and restore things in the worst case.
  2. Make sure you can get to your backup! @waffleslop and I spent a considerable part of our time figuring out how to get a copy of /var/discourse to the new Droplet. Something very weird was going on with the original Droplet and we weren’t able to just scp the files to the new Droplet. For my own servers, I put backups on S3 and I copy them to my local machine every night. Is that excessive? Probably. But it does give me plenty of options when things stop working for some reason.
  3. Test your backups from time to time. When your production servers are down, you want to have confidence that you know what you are doing. Ideally you’d test a backup just before doing an update so that you have a place to fall back on if anything goes wrong with production. But it’s usually enough to try a backup as often as needed to keep the process fresh in your mind.
  4. Two heads are better than one. Maybe this is self-interest talking, but it can be a lot easier to get through an emergency if you can share your screen on a call with someone who has experience with this sort of situation. Ideally you want someone who knows how to use the command line.

As long as you make a backup, you should be pretty safe to upgrade.

4 Likes