Bootstrap hangs forever

So in trying to troubleshoot a weird issue that happened after updating to beta11 today (CORS + CDN, some other issue I’ll describe another time), I decided it best to try to rebuild the site, but that would just hang unresponsively. I tried a full bootstrap and the same thing:

root@host:/var/discourse# ./launcher bootstrap web_only

WARNING: We are about to start downloading the Discourse base image
This process may take anywhere between a few minutes to an hour, depending on your network speed

Please be patient

Unable to find image 'discourse/base:2.0.20191219-2109' locally
2.0.20191219-2109: Pulling from discourse/base
Digest: sha256:d6dff261a474d5556b134e38ccfb2afecc1478827d3841b70e4804004eca2a03
Status: Downloaded newer image for discourse/base:2.0.20191219-2109

And then it just hangs, and hangs, and hangs. I’m fairly patient, but shouldn’t there be something by now? I thought the “be patient” was only about the download bit.

Any chance this could be related to a new version of Docker installed today? (Ubuntu 18.04.4 LTS)

root@host:~# docker version
Client: Docker Engine - Community
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.16
 Git commit:        369ce74a3c
 Built:             Thu Feb 13 01:27:49 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.16
  Git commit:       369ce74a3c
  Built:            Thu Feb 13 01:26:21 2020
  OS/Arch:          linux/amd64
  Experimental:     false
  Version:          1.2.12
  GitCommit:        35bd7a5f69c13e1563af8a93431411cd9ecf5021
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
  Version:          0.18.0
  GitCommit:        fec3683
1 Like

So like what’s happening to me? I’ve tried twice tonight without success.

Maybe, I saw that but wasn’t sure it was the same thing. You seem to be getting further along than I do.

Containers look OK, although it’s unclear if the discourse image is actually doing anything. ./launcher logs web_only just hangs too with no output.

CONTAINER ID        IMAGE                              COMMAND             CREATED             STATUS              PORTS                NAMES
26ec161726cd        discourse/base:2.0.20191219-2109   "echo working"      16 minutes ago      Up 16 minutes                            fervent_beaver
594aa5ef66b6        local_discourse/mail-receiver      "/sbin/boot"        2 weeks ago         Up 17 minutes>25/tcp   mail-receiver
ae14d4d32fbb        local_discourse/data               "/sbin/boot"        3 weeks ago         Up 17 minutes                            data

Nothing useful from ps after the bootstrap, either:

root      1768  0.0  0.1  13444  3544 pts/0    S+   04:10   0:00 bash ./launcher bootstrap web_only
root      1846  0.0  3.1 710040 64376 pts/0    Sl+  04:10   0:00 /usr/bin/docker run -i --rm -a stdout -a stderr discourse/base:2.0.20191219-2109 echo working
root      1869  0.0  0.2 109108  5188 ?        Sl   04:10   0:00 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/26ec16

Update: I tried to revert/downgrade to Docker version 19.03.5 to see if things worked again. I did get further than I did before. The bootstrapping for the webapp got as far as checking to see if the data container existed. Of course, it didn’t, since I had to remove all of the old containers to downgrade docker. I then tried to bootstrap the docker container for the database, which worked, but it hung when trying to ./launcher start.

Tried it a few times, but nothing was successful.

Since this is a production system, I needed to get back up and running, so I restored from a Sunday full system backup with the old docker version 19.03.5. (Since my /var/discourse is mounted on external storage I didn’t lose any data.) Now off to figure out this weird CORS issue.

Meanwhile, this might be something folks want to look into. I know I won’t be re-boostrapping anything else until I make sure I have a good clean full backup just before. :slight_smile:

I just did two installations one with 2gb and one with 1gb of ram. They both worked. The 1GB instance finished in under 10 minutes.

One guess I had, which seems like a long shot, is that you have some networking issue that’s interfering with SSH after some period of time.

But that doesn’t explain why @smrtey is also having trouble.

1 Like

So I thought this too at one point, because when I was rm’ing or killing the unused docker containers, it would disconnect all SSH connection (and kill the entire docker service too?!)… But, while the bootstrap was hanging, I was watching top or doing ps listings, and they eventually just hung while the system idled with other random little processes popping up from time to time.

Still baffled on this one.