Install problem: `getaddrinfo: Name or service not known`

(Jay Pfaffman) #1

I’ve got three new servers running Ubuntu 16.04 & apparently have some Docker problem. Though I suspect it’s a trivial and “obvious” Docker problem, I’ve searched more widely for answers, but have come up empty. I’d like to think that it’s some new Docker bug (and not something stupid that I’ve done), but clean installs on Digital Ocean work fine.

It seems that the container can’t download gems.

Docker version 18.04.0-ce, build 3d479c0
I, [2018-05-04T00:42:53.689239 #15]  INFO -- : > echo "done configuring web"
I, [2018-05-04T00:42:53.690557 #15]  INFO -- : done configuring web

I, [2018-05-04T00:42:53.690718 #15]  INFO -- : > cd /var/www/discourse && gem update bundler
ERROR:  While executing gem ... (SocketError)
    getaddrinfo: Name or service not known
I, [2018-05-04T00:42:53.829426 #15]  INFO -- : Updating installed gems

I, [2018-05-04T00:42:53.829592 #15]  INFO -- : Terminating async processes
I, [2018-05-04T00:42:53.829620 #15]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec

It looks like it can’t resolve something, but I’ve tried putting DOCKER_OPTS="--dns --dns" in/etc/default/docker (as suggested here) and no joy. Another container I can get in to and ping (That container is a VPN, which worked fine for a while and I can’t connect to now; I have no idea whether that’s related.)

I tried this (as suggested here):

docker run -it --rm ubuntu bin/bash
apt upgrade
apt install iputils-ping

and it works.

The host has several static IPs. Could docker not know how to bind to them to the right one? No. I changed to a single IP and that didn’t fix it.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:e0:4c:68:02:54 brd ff:ff:ff:ff:ff:ff
    inet brd scope global enp1s0
       valid_lft forever preferred_lft forever
    inet6 fe80::2e0:4cff:fe68:254/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:6f:3a:ac:26 brd ff:ff:ff:ff:ff:ff
    inet brd scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:6fff:fe3a:ac26/64 scope link 
       valid_lft forever preferred_lft forever
6: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:e3:f5:bd brd ff:ff:ff:ff:ff:ff
    inet brd scope global virbr0
       valid_lft forever preferred_lft forever
7: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:e3:f5:bd brd ff:ff:ff:ff:ff:ff

Providing data for GDPR
Providing data for GDPR
(Matt Palmer) #2

Check if it’s a recently-introduced Docker bug by downgrading to 18.03. Can’t say it’s likely to fix it, but it’s easy enough to try it first just in case.

Modify the app.yml to run the ping test before anything else; that’ll rule in (or out) problems with the app container itself versus network issues.

After that, you’re in tcpdump territory. Capture DNS packets going over docker0 with something like tcpdump -i docker0 -n port 53, and make sure the packets are at least getting onto the network. Then make sure they’re getting out by capturing on enp1s0 as well – and make sure they’re gooing out with the machine’s public IP, rather than some internal IP. If the firewall’s too restrictive, the packets won’t show up at all on one of those tcpdumps, and if they’re on enp1s0 but with a private IP, then the NAT config’s busted.

If the packets are going out but not coming back, there’s upstream network problems (or an OUTPUT rule that’s eating them at the last minute), but given you’ve got some DNS resolution on the machine, the problem’s unlikely to be there. If the DNS response packets are coming back in part of the way, then it’s incoming firewall rules that are the problem, and if they appear to be making it all the way back to docker0, dump the veth<whatever> interface for the container to double-check, and if they’re there too, something’s setting up iptables rules inside the container, and you’re back to giving Docker some serious side-eye.

(Jay Pfaffman) #3

Thanks, Matt. I am somewhat relieved that you didn’t say something like “Uh, duh, you just need to fix the permissions,” so I don’t look foolish. But, really, I’m disheartened that there’s not an obvious and quick solution.

I’m 90% sure that I did an install that worked when these machines were on my home network.

I guess I’ll dust off my discourse-pin-docker script and work down the list. Well, that wasn’t it.

But, the tcpdump appears to have shed the necessary light! And now I see the typo in /etc/network/interfaces (perfectly replicated on all three servers) that was the root of the problem.

Hooray for simple (that I still don’t quite understand) networking errors!

And from there I found that I do know how to get pups to bind to a single IP.

And soon, a few people can use a server that if it catches on fire will be down until I wake up and look at my phone.

Thanks, Matt. My day is saved.


(Jeff Atwood) #4

All this for… a typo? How is this going to help any future readers?

(Jay Pfaffman) #5

Yeah. A typo. I’m afraid my tale of woe is unlikely help future readers, except to know that you can use tcpdump to watch port 53. That was the saving grace.

The actual typo was somehow I copy-pasted dns-nameservers in twice, so I had

uto enp1s0
iface enp1s0 inet static
  dns-nameservers dns-nameservers

rather than


And DNS worked all the time except, inside of docker. (Docker used only the first entry rather than using the next one if the first failed?)