Web_only container needs to be restarted after DB restart?


(Tomek) #1

Is it just me or Discourse in the multidocker setup (data container on a different phisical server) cannot get back from error 500 state after reseting the DB server/container?

When I shutdown the DB server, my web_only container returns error 500 to the browser (obviously). When I start the data container up again, the web_only container needs to be restarted for Discourse to be up and running again. Can you fix this?


(Matt Palmer) #2

That sounds like the data container is getting a different IP address and, because the containers are on different servers, Docker’s usual linking support isn’t able to deal with it. You’ll need to sort out some sort of service discovery mechanism such that the web container can be reliably informed of changes to the DB container’s IP address. Done correctly, Discourse is definitely able to handle this situation, because that’s how linked containers (and the discourse.org hosted service) operates, and it works Just Fine. In other words, there’s nothing to fix in Discourse, but rather your hosting environment needs additional features to support containerised services (in general; there’s nothing Discourse-specific here, the same problem will occur with most any other service that needs to communicate with containers on other machines).


(Tomek) #3

Why would you assume this? This is not the case, the DB server has a fixed IP and after restart the DB is available on the same IP. The web_only container simply seems to have troubles reconnecting to the DB after disconnect. This should not happen and is obviously something that needs a fix :slight_smile:


(Tomek) #4

What’s the correct way to do this?


(Matt Palmer) #5

I didn’t assume, I guessed, because there is insufficient information to provide a definitive diagnosis. The web app container is quite capable of reconnecting to the database, it works quite fine and is tested several times a day in the discourse.org hosted environment. If you provide detailed information on what needs fixing, I’d be happy to do so.


(Tomek) #6

I have a web_only container on 10.0.100.1 IP and a data container on 10.0.100.2 IP.

My data container exposes two ports for postgres and redis:

expose:
  - "5432:5432" # psql
  - "6379:6379" # redis

My web_only container has the following DB config:

env:
  DISCOURSE_DB_USERNAME: discourse
  DISCOURSE_DB_PASSWORD: ------------
  DISCOURSE_DB_HOST: 10.0.100.2
  DISCOURSE_REDIS_HOST: 10.0.100.2
  DISCOURSE_DB_POOL: 20

This is a simple setup and it works well, problem is web_only cannon reconnect to DB after a DB restart and needs to be restarted as well.


(Alan Tan) #7

Can you try updating to latest?

I believe this will fix your issue

To be 100% sure that is causing the problem, can you check your logs to look for the exact error that is causing the server to throw 500?


(Tomek) #8

I would have to stop my forum to check this now, will do that when needed and then I’ll let you know. Thanks!