Nginx.http.sock bind failed after reboot

Hello all,

I have a discourse forum up and running for 2 months now.
I installed it via Docker following the official Instructions. I have it running with nginx to proxy pass requests to discourse.

Everything was working fine. Today my hosting company had to reboot my machine. And now I’m getting a 502 Bad Gateway error when trying to access my forum.

I found out, that the problem seems to be the following:

 [emerg] 781#0: bind() to unix:/shared/nginx.http.sock failed (98: Address already in use)

So the container cant bind to the socket it seems?

nginx gives following error as was to be expected

[error] 2298#0: *1 connect() to unix:/var/discourse/shared/standalone/nginx.http.sock failed (111: Connection refused) while connecting to upstream, client: xx.yy.zzz.aaa, server: forum.example.com, request: "GET / HTTP/1.1", upstream: "http ://unix:/var/discourse/shared/standalone/nginx.http.sock:/", host: "forum.example.com"

My app.yml looks like this:

app.yml
##
## After making changes to this file, you MUST rebuild for any changes
## to take effect in your live Discourse instance:
##
## /var/discourse/launcher rebuild app
##
## Make sure to obey YAML syntax! You can use this site to help check:
## http ://www.yamllint.com/

## this is the all-in-one, standalone Discourse Docker container template

# You may add rate limiting by uncommenting the web.ratelimited template.
# Out of the box it allows 12 reqs a second per ip, and 100 per minute per ip
# This is configurable by amending the params in this file

templates:
  - "templates/postgres.template.yml"
  - "templates/redis.template.yml"
  - "templates/web.template.yml"
  - "templates/sshd.template.yml"
  - "templates/web.ratelimited.template.yml"
  - "templates/web.socketed.template.yml"

## which TCP/IP ports should this container expose?
expose:
  - "2222:22" # fwd host port 2222 to container port 22 (ssh)

params:
  db_default_text_search_config: "pg_catalog.english"

  ## Set db_shared_buffers to a max of 25% of the total memory.
  ##
  ## On 1GB installs set to 128MB (to leave room for other processes)
  ## on a 4GB instance you may raise to 1GB
  #db_shared_buffers: "256MB"
  #
  ## Set higher on large instances it defaults to 10MB, for a 3GB install 40MB is a good default
  ## this improves sorting performance, but adds memory usage per-connection
  #db_work_mem: "40MB"
  #
  ## Which Git revision should this container use? (default: tests-passed)
  #version: tests-passed

env:
  LANG: en_US.UTF-8
  # DISCOURSE_DEFAULT_LOCALE: en

  ## TODO: How many concurrent web requests are supported?
  ## With 2GB we recommend 3-4 workers, with 1GB only 2
  #UNICORN_WORKERS: 3

  ## TODO: List of comma delimited emails that will be made admin and developer
  ## on initial signup example 'user1@example.com,user2@example.com'
  DISCOURSE_DEVELOPER_EMAILS: 'mail@example.com'

  ## TODO: The domain name this Discourse instance will respond to
  DISCOURSE_HOSTNAME: 'forum.example.com'

  ## TODO: The mailserver this Discourse instance will use
  DISCOURSE_SMTP_ADDRESS: ***         # (mandatory)
  DISCOURSE_SMTP_PORT: 587                        # (optional)
  DISCOURSE_SMTP_USER_NAME: ***      # (optional)
  DISCOURSE_SMTP_PASSWORD: ***              # (optional)
  #DISCOURSE_SMTP_ENABLE_START_TLS: true           # (optional, default true)

  ## The CDN address for this Discourse instance (configured to pull)
  #DISCOURSE_CDN_URL: //discourse-cdn.example.com

## These containers are stateless, all data is stored in /shared
volumes:
  - volume:
      host: /var/discourse/shared/standalone
      guest: /shared
  - volume:
      host: /var/discourse/shared/standalone/log/var-log
      guest: /var/log

## The docker manager plugin allows you to one-click upgrade Discourse
## http ://discourse.example.com/admin/docker
hooks:
  after_code:
    - exec:
        cd: $home/plugins
        cmd:
          - mkdir -p plugins
          - git clone https ://github.com/discourse/docker_manager.git
          - git clone https ://github.com/discourse/discourse-tagging.git
          - git clone https ://github.com/discourse/discourse-spoiler-alert.git

## Remember, this is YAML syntax - you can only have one block with a name
run:
  - exec: echo "Beginning of custom commands"

  ## If you want to set the 'From' email address for your first registration, uncomment and change:
  #- exec: rails r "SiteSetting.notification_email='info@unconfigured.discourse.org'"
  ## After getting the first signup email, re-comment the line. It only needs to run once.

  ## If you want to configure password login for root, uncomment and change:
  ## Use only one of the following lines:
  #- exec: /usr/sbin/usermod -p 'PASSWORD_HASH' root
  #- exec: /usr/sbin/usermod -p "$(mkpasswd -m sha-256 'RAW_PASSWORD')" root

  ## If you want to authorized additional users, uncomment and change:
  #- exec: ssh-import-id username
  #- exec: ssh-import-id anotherusername

  - exec: echo "End of custom commands"
  - exec: awk -F\# '{print $1;}' ~/.ssh/authorized_keys | awk 'BEGIN { print "Authorized SSH keys for this container:"; } NF>=2 {print $NF;}'

My nginx config looks like this:

nginx config
server {
	listen 80;
	# change this
	server_name forum.example.com;

	location / {
        proxy_pass http ://unix:/var/discourse/shared/standalone/nginx.http.sock:;
		proxy_set_header Host $http_host;
		proxy_http_version 1.1;
		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        access_log /var/log/nginx/discourse-access.log;
        error_log /var/log/nginx/discourse-error.log;
	}
}

I tried the following things:

  • restart the app
  • rebuild the app
  • reboot the host
  • stop app, stop nginx, start app, start nginx

All still result in the above error message.
When I attach to the app via docker attach the above error message gets printed repeatedly.

As stated above, this worked already for the past 2 month.

I have no clue why the socket can not be bound anymore… I mean the old socket can not still be up and blocking because I rebooted the host maschine…
Any help would be appreciated.

Cheers, Christopher

PS: I had to put spaces behind all http and https occurrences because otherwise it would not let me create the topic.

1 Like

Ok, I had to delete

/var/discourse/shared/standalone/nginx.http.sock

Seemed the socket got not deleted when the hosting company restarted my machine.
Maybe they did not give the time for everything to gracefully shutdown.

Sometimes it helps talking/writing about something.
I was looking to find this for several hours already… That it would be so simple in the end…

Glad it works again! :smiley:

4 Likes

We should have the rebuild delete that file.

3 Likes

Or have stop delete that file.

Or have start delete the file if it already exists.

4 Likes

I like both approaches. Rebuild should in any case delete that file.

But in my opinion it would do no harm to delete the file also when the app starts (only if no instance is already running).
The file is inside /var/discourse. So chances that anything else but discourse is using this file are very unlikely (and in any case that would be the result of an error somewhere else).

1 Like

I have the same issue. First, I thought that my own nginx has to be started a while after the docker containers (yet to confirm), but I found out then that the issue is related to the sock file.

From what I found on stackoverflow :wink: this issue seems to be a nginx one that is supposed to delete the sock file on termination. The discourse container has nginx v1.7.x Maybe an update to v1.8.x + changing the init.d script would do the trick?

Have a look:

1 Like

I had this issue today, and this saved us!
Thanks so much!

1 Like

Bumping this.

When the docker service is stopped, the socket file isn’t removed – it turns into a directory.
I had to rmdir /var/discourse/shared/standalone/nginx.http.sock to be able to start my container.

This happens after a reboot or an upgrade of docker-ce.
Note that in my setup the socket is bound to another container (front nginx).

2 Likes

You know there is a reason we have this line in our template :grimacing:

https://github.com/discourse/discourse_docker/blob/master/templates/web.socketed.template.yml#L8-L13

2 Likes

I get that but is it enough?

I don’t think everyone rebuilds every container after a reboot. Well at least not the people in this topic :smiley:

That is enough cause every time you stop or start the container cleanup runs

1 Like

Then there is a problem, because as I said:

When I start my Discourse container I get errors in the logs because it won’t bind to the socket. At that point Discourse won’t delete it nor recreate it.

I’ll investigate when this happens again and provide full logs. I’m well aware that the problem may be related to my own Docker configuration

It happened again today on my setup :frowning:

Could you maybe consider adding a failsafe for the above scenario?

rmdir /shared/nginx.http*.sock

2 Likes

Still not following why the existing failsafe that runs on boot is not fixing it up for you, is there anything custom about your template

Because rm (without -r) won’t remove a directory.

I still don’t understand the sock files are not directories…

I’m not sure either. As I said earlier, I suppose nginx.http.sock turns into a directory after a reboot because it is mounted in another Docker container.

1 Like

This is very strange, I guess do a PR the rms those files regardless of them being files or directories, I am fine with amending the boot and shutdown scripts with that

2 Likes

Done!

Thanks for considering it :slight_smile:

I am also experiencing this issue: I have discourse in one docker and caddy with proxy in another. It may be related to the order how app and caddy are started, but sometimes I end up with directory instead of socket.