Docker install fails to return after power outage, lxc-network does not start

(J Steiner) #1

Official Docker install on Ubuntu here. Everything was working very well until our electricity went down early yesterday morning and stayed out long enough that the UPS ran out of juice. When the power came back on and everything came back up, all I’d get when trying to visit my forum is the “Welcome to nginx” message.

I tried various things, too many to remember, but have made no headway (and am still getting the same error message). Here’s what I know, and where it currently stands:

  1. When my system is booting, it shows Fail when starting lxc-network.

  2. No matter what I do with ./launcher, whether rebuild app, ssh app, whatever, the outut is this:
    sudo ./launcher start app
    WARNING: No swap limit support
    Unable to find image ‘samsaffron/discourse:1.0.3’ locally
    Pulling repository samsaffron/discourse

    2014/09/22 12:50:40 unexpected EOF
    Your Docker installation is not working correctly

    See: Docker error on Bootstrap

Yes, before anybody asks… I did check that link but wasn’t sure the info there applied to this situation. I have a feeling that Fail message on startup has something to do with this… how do I diagnose that? Of course my ‘feeling’ may be way wrong.

(Jens Maier) #2

I’d be far more interested in why lxc-network fails to start. Do you have any logs concerning that?

(J Steiner) #3

Actually, sorry… got it slightly wrong. It’s hard to see what’s up for sure when the startup log is whizzing by… It’s actually “Stopping lxc network… [fail].” And I just noticed, by doing some quick skimming after a reboot command, that while shutting down it starts lxc and returns OK as a result. Whatever?? Weird things can have a way of happening.

I wasn’t able to find detailed lxc logs… :blush: where do I look?

(Jens Maier) #4

Good question, I have no idea how Ubuntu manages its logs by default. It’ll certainly be in some file in /var/logs:wink:

Is sudo service --status-all showing anything interesting?

(J Steiner) #5
 [ + ]  acpid
 [ + ]  apache2
 [ + ]  apparmor
 [ ? ]  apport
 [ + ]  atd
 [ ? ]  console-setup
 [ + ]  cron
 [ - ]  dbus
 [ ? ]  dns-clean
 [ - ]  docker
 [ - ]  exim4
 [ + ]  friendly-recovery
 [ - ]  grub-common
 [ ? ]  irqbalance
 [ ? ]  killprocs
 [ ? ]  kmod
 [ ? ]  mysql
 [ ? ]  networking
 [ + ]  nginx
 [ + ]  ntp
 [ ? ]  ondemand
 [ ? ]  openerp-server
 [ + ]  postfix
 [ - ]  postgresql
 [ ? ]  pppd-dns
 [ - ]  procps
 [ ? ]  rc.local
 [ + ]  redis-server
 [ + ]  resolvconf
 [ - ]  rsync
 [ + ]  rsyslog
 [ + ]  screen-cleanup
 [ ? ]  screen-cleanup.dpkg-new
 [ ? ]  sendsigs
 [ - ]  ssh
 [ - ]  sudo
 [ + ]  udev
 [ ? ]  umountfs
 [ ? ]
 [ ? ]  umountroot
 [ - ]  unattended-upgrades
 [ - ]  urandom
 [ - ]  x11-common

(Jens Maier) #6

Uh… so, is Docker running or not? I’d instinctively interpret the - as “no”, but I have to assume that the two errors regarding the missing repository originated from Docker, so… maybe some parts of the docker client can run without the manager daemon?

I’m a bit out of my depths here as I have no first-hand experience with Docker, so it’s hard to say anything more without having a look for myself. Still, as a first, exploratory attempt, I’d stop the docker daemon, rename /var/lib/docker, reinstall and restart docker and then run ./launcher bootstrap app. If that works, tar up the renamed directory (better to have a backup of the backup), then delete it.

Obviously, this amounts to a fill wipe of all existing Docker containers. Discourse’s data is stored on the host, so the container can be rebuilt at any time, but if you had any other containers running, this approach won’t work for you.

(J Steiner) #7

service docker status showed it was running. sudo service docker stop resulted in docker stopped / waiting and sudo service docker start resulted in docker start/running, process 2812.

Still, I followed your advice, and during the re-install of docker I got this error:

Cannot connect to the Docker daemon. Is 'docker -d' running on this host?

I have backups of my site, so as long as restore works properly I’m not that worried about blowing things up; I just want to get my site back up! All I really want to know any more is, how can I get rid of Docker and all of its (presumably) messed up settings, and start over? I have tried renaming my /var/discourse folder and making a fresh container, etc; but of course Docker is not behaving so I can not bootstrap.

I have tried sudo apt-get remove docker, and other variants such as sudo apt-get remove lxc-docker, etc, then re-installing… all with the same results. I know this is not Windows but I have even restarted between uninstall / reinstall. :wink: No joy.

(Jens Maier) #8

Hmmm… the thing about docker -d not running during the installation bothers me. Try again, but after stopping and attempting to uninstall docker, look through ps axu if the docker daemon is really gone, then install docker with curl -sSL | sudo sh. Then restart the docker daemon and check ps axu again to see if it’s actually there.

And to be honest, a reboot before the final start of docker might not hurt… just to make sure any of docker’s network devices and firewall rules are flushed out. :wink:

(J Steiner) #9

I didn’t find anything docker related in output of ps aux. I uninstalled again, rebooted, and reinstalled using the command you specified. Here’s the output… notice especially the last line:

The following extra packages will be installed:
  aufs-tools lxc-docker-1.2.0
The following NEW packages will be installed:
  aufs-tools lxc-docker lxc-docker-1.2.0
0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/4,198 kB of archives.
After this operation, 13.8 MB of additional disk space will be used.
Selecting previously unselected package aufs-tools.
(Reading database ... 76056 files and directories currently installed.)
Preparing to unpack .../aufs-tools_1%3a3.2+20130722-1.1_amd64.deb ...
Unpacking aufs-tools (1:3.2+20130722-1.1) ...
Selecting previously unselected package lxc-docker-1.2.0.
Preparing to unpack .../lxc-docker-1.2.0_1.2.0_amd64.deb ...
Unpacking lxc-docker-1.2.0 (1.2.0) ...
Selecting previously unselected package lxc-docker.
Preparing to unpack .../lxc-docker_1.2.0_amd64.deb ...
Unpacking lxc-docker (1.2.0) ...
Processing triggers for man-db ( ...
Processing triggers for ureadahead (0.100.0-16) ...
ureadahead will be reprofiled on next reboot
Setting up aufs-tools (1:3.2+20130722-1.1) ...
Setting up lxc-docker-1.2.0 (1.2.0) ...
docker start/running, process 3328
Setting up lxc-docker (1.2.0) ...
Processing triggers for libc-bin (2.19-0ubuntu6.3) ...
+ sh -c docker run hello-world
Unable to find image 'hello-world' locally
Pulling repository hello-world
565a9d68a73f: Pulling dependent layers
511136ea3c5a: Pulling fs layer
2014/09/23 08:23:44 unexpected EOF

In anxiety to get this site available, I put up another VM with a brand new Ubuntu 14.04 LTS install, and tried the installation right from scratch; the docker install resulted in exactly the same message on the new machine. I then tried bootstrapping my app to see what would happen, and got the same error message as before. Are the instructions here still any good? I had followed them originally, the only difference being that my server is local instead of hosted by Digital Ocean.

docker --version returns Docker version 1.2.0, build fa7b24f.

(Jens Maier) #10

Hmm, I tried to repro this, but curl -sSL | sudo sh works flawlessly on a Ubuntu 14.04 amd64 server I installed in a fresh VMware virtual machine just now… and bootstrapping Discourse is working as well, without any changes on my part. It survived a dist-upgrade and reboot, too… :confused:

(J Steiner) #11

All right, I’m going to try again. I’m running in Hyper V here, for whatever that’s worth. But here goes; I’ll blow up my second (experimental) vm and start fresh. Let you know what happens.

(J Steiner) #12

Same. Old. Thing. Brand new install of 14.04 LTS Server. I tried curl -sSL | sudo sh with the same results shown above; immediately after “Pulling fs layer” I get the error message “unexpected EOF.”

I then did sudo apt-get remove lxc-docker-1.2.0, then reinstalled using wget -qO- | sudo sh, with same results again.

Thankfully my site is not high traffic. Still, this is getting annoying.

EDIT. I think I may have found the issue. We’re behind a proxy / firewall here; I put my vm into firewall bypass with an iptables command and that seems to have solved it. But I’m curious why it worked before, through the proxy, if it fails now? Oh well, guess that puts the issue into my lap to diagnose. Thanks for your help!