Discourse not serving pages

I am having some issues running Discourse after installation.

Per the following page once I have run the discourse-setup script I should be able to browse to the URL configured. Apparently I have to call the following as well.
./launcher start app

Not sure if this is a documentation error or me doing something wrong? I also notice that once I ran the above command I have the following observations:

  1. I am able to load a web page when browsed to the URL configured.
  2. The ‘Welcome to Nginx’ page is loaded instead of the ‘Congratulations, you installed Discourse!’ page.
  3. After a short while (eg half a minute) the ‘Welcome to Nginx’ page is not loading.
  4. Run ./launcher stop app followed by ./launcher start app, and notice I am able to load the ‘Welcome to Nginx’ page but not so after a short while again.

This is installed on a new dedicated VM without Nginx running on the machine so the ‘Welcome to Nginx’ page is coming from the container running Discourse.

None of those things are expected. It sounds like you have some other nginx running, perhaps? Is anything else running on the vm?

1 Like

It isn’t possible because it is a bare bone minimalistic CentOS install. I checked by running the following command there aren’t anything listening on port 80 or 443 without the container running.
lsof -i TCP

1 Like

My best guess is that there is some issue with Centos and discourse-setup. The easiest thing would be to try Ubuntu. Otherwise, you’ll need to pay attention to what the script does AMF try to debug the issue.

If you have followed the following install guide, recommended by someone not to be mentioned :smiley:

You should not have any issues.

https://github.com/discourse/discourse/blob/master/docs/INSTALL-cloud.md

1 Like

Sigh I actually tried but found both the 19 and 20 LTS installer crashing half way through. Both would say they are unable to autoconfig the interface. If I leave it disabled the installer runs fine but if I set IP manually they will crash.

If I disable the interface so I can continue with the install I can’t install network tools to get ifconfig even if I mounted the ISO and use it as source. So I am a bit stuck.

Hi @titusc

What do you see when you do a:

docker ps 

?

You’re saying that the Ubuntu installer doesn’t work?

Or are you trying to usual discourse with an ip number rather than a domain name

1 Like

I’m not using IP address for the Discourse installation. I’m using a proper domain name that is resolvable on public DNS.

I’m saying the Ubuntu installer always crash whenever I’m trying to run the installer by booting off the Ubuntu ISO disk image during the machine creation. This happens for both Ubuntu Server 19.10 or 20.04 LTS,and during both installation it’d report saying it can’t setup the network interface. If I leave it as is they’d install fine but I have no means of bringing the interface up to do anything. I did run the following successfully.
ip address add <ip>/<mask> dev <interface>

But then I am unable to bring it up by running the following.
ifup <interface>

I did then mount the ISO as a loopback and set it up as a repo and then try to run the following but it says it’s not found from the ISO.
apt install net-tools

If I try to set the interface manually with network details during the install both versions would crash.

For the record I’m doing this on ESXi 7.0 and am using the following ISOs.
ubuntu-20.04-live-server-amd64.iso
ubuntu-19.10-live-server-amd64.iso

It’d show the Discourse container is running. Each time I do the following.

  1. ./launcher start app
  2. Check on browser and see ‘Welcome to Nginx’ page but then not anymore after about half a minute.
  3. docker ps to confirm the Discourse container is running.
  4. ./launcher stop app and ./launcher start app
  5. Check on browser and see ‘Welcome to Nginx’ page but then not anymore after about half a minute.
  6. docker ps to confirm the Discourse container is running.

What I also notice is when I run the following commands I can’t see Nginx running but it could be due to it already having been stopped by the time I run it.
./launcher enter app
systemctl status nginx

@titusc

Have you tried rebuilding the app?

/var/discourse/launcher rebuild app

Normally, depending on when your container spins up completely, it can take time (based on your system specs) to be fully operational.

Even on our 64GB Linux box with 8 CPU cores, we wait around one solid minute before switching over to the new container.

1 Like

Did you try 18.04?

Sounds like an environmental issue, we can’t really offer support for hypervisors and Linux distributions here though. One of the reasons DigitalOcean is recommended is the consistency of the operating system being installed on top of. If you want to build a VM from scratch you’re going to need to figure that out beforehand.

Once you have a working OS we can help install the Discourse portion.

Hello @DBHacker yes I did do the following in sequence.
./launcher rebuild app
./launcher start app

This is after I realize the following all it does is up to the rebuild part of the launcher.
./discourse-setup

@Stephen yes I need to work out the issues with the Ubuntu install first. To be honest I’m not a big fan of Ubuntu and have been using CentOS / RH for the last 15 years. What I do want to ask though is are you expecting anything specific we have to setup for CentOS / RH?

The discourse-setup script is tested and proven on Ubuntu.

You may have to do some or all of the steps by hand on a different distribution. Take a look at the contents of the file to get an idea of what it’s doing.

Hi @titusc

Sorry you are having issues.

FWIW, you do not need to run:

./launcher start app

after you run:

./launcher rebuild app

because when you rebuild the app with the launcher script (see below), that script restarts the container before it exits.

Here is some relevant part of the code from launcher:

  rebuild)
      if [ "$(git symbolic-ref --short HEAD)" == "master" ]; then
        echo "Ensuring launcher is up to date"

        git remote update

        LOCAL=$(git rev-parse HEAD)
        REMOTE=$(git rev-parse @{u})
        BASE=$(git merge-base HEAD @{u})

        if [ $LOCAL = $REMOTE ]; then
          echo "Launcher is up-to-date"

        elif [ $LOCAL = $BASE ]; then
          echo "Updating Launcher..."
          git pull || (echo 'failed to update' && exit 1)

          echo "Launcher updated, restarting..."
          exec "$0" "${SAVED_ARGV[@]}"

        elif [ $REMOTE = $BASE ]; then
          echo "Your version of Launcher is ahead of origin"

        else
          echo "Launcher has diverged source, this is only expected in Dev mode"
        fi

      fi

      set_existing_container

      if [ ! -z $existing ]
        then
          echo "Stopping old container"
          (
            set -x
            $docker_path stop -t 60 $config
          )
      fi

      run_bootstrap

      if [ ! -z $existing ]
        then
          echo "Removing old container"
          (
            set -x
            $docker_path rm $config
          )
      fi

      run_start
      exit 0
      ;;

You can see from the script that the rebuild method attempts to start the container before it exits.

HTH

@neounix you are right. Will test again and check on this. Hopefully this was the cause. Not sure of what’s the behavior if it starts the container and I start it again by running ./launcher start app

Dear @titusc

Sorry not to reply in more detail, but I have to run off on a long road trip.

Many people who make errors with building, rebuilding docker images and containers tend to get into trouble.

You might consider cleaning up (pruning) your “accumulated” docker images and unused containers at some point.

For example (off the top of my head quickly)

docker ps -a
docker images
docker system prune -a

You shoud stop all “rouge” containers, and remove them and delete all the old Docker images.

Gotta run…

HTH

1 Like

@neounix agreed with checking the docker images. I actually have done that but let me show you this in detail. As you can see below Discourse was successfully built and started to run in a docker container. Towards the end you can see Nginx was running in container but quitted after only 5 seconds or so.

Before Everything Starts

[root@uat discourse]# pwd
/var/discourse
[root@uat discourse]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@uat discourse]# docker container ls -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@uat discourse]# docker image ls
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
discourse/base      2.0.20200724-1815   6ba1506bf822        9 days ago          2.38GB
centos              latest              831691599b88        6 weeks ago         215MB
alpine              latest              a24bb4013296        2 months ago        5.57MB

Confirm No HTTP Server Is Running On Host

[root@uat discourse]# systemctl status nginx
Unit nginx.service could not be found.
[root@uat discourse]# lsof -i TCP
COMMAND  PID    USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
systemd    1    root  181u  IPv4   19283      0t0  TCP *:sunrpc (LISTEN)
systemd    1    root  183u  IPv6   19285      0t0  TCP *:sunrpc (LISTEN)
rpcbind 1070     rpc    4u  IPv4   19283      0t0  TCP *:sunrpc (LISTEN)
rpcbind 1070     rpc    6u  IPv6   19285      0t0  TCP *:sunrpc (LISTEN)
sshd    1234    root    5u  IPv4   26444      0t0  TCP *:ssh (LISTEN)
sshd    1234    root    7u  IPv6   26446      0t0  TCP *:ssh (LISTEN)
cupsd   1240    root    9u  IPv6   27746      0t0  TCP localhost:ipp (LISTEN)
cupsd   1240    root   10u  IPv4   27747      0t0  TCP localhost:ipp (LISTEN)
dnsmasq 2094 dnsmasq    6u  IPv4   37419      0t0  TCP uat:domain (LISTEN)
sshd    7102    root    5u  IPv4 2072827      0t0  TCP uat:ssh->10.1.136.4:52229 (ESTABLISHED)
sshd    7156    tech    5u  IPv4 2072827      0t0  TCP uat:ssh->10.1.136.4:52229 (ESTABLISHED)

Rebuild Discourse

[root@uat discourse]# ./launcher rebuild app
Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
cd /pups && git pull && /pups/bin/pups --stdin
Already up to date.
......................................................................
I, [2020-08-03T06:54:22.114365 #1]  INFO -- : > echo "Beginning of custom commands"
I, [2020-08-03T06:54:22.116739 #1]  INFO -- : Beginning of custom commands

I, [2020-08-03T06:54:22.116996 #1]  INFO -- : > echo "End of custom commands"
I, [2020-08-03T06:54:22.119862 #1]  INFO -- : End of custom commands

I, [2020-08-03T06:54:22.119983 #1]  INFO -- : Terminating async processes
I, [2020-08-03T06:54:22.120021 #1]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/12/bin/postmaster -D /etc/postgresql/12/main pid: 49
I, [2020-08-03T06:54:22.120086 #1]  INFO -- : Sending TERM to exec chpst -u redis -U redis /usr/bin/redis-server /etc/redis/redis.conf pid: 166
166:signal-handler (1596437662) Received SIGTERM scheduling shutdown...
2020-08-03 06:54:22.120 UTC [49] LOG:  received fast shutdown request
2020-08-03 06:54:22.121 UTC [49] LOG:  aborting any active transactions
2020-08-03 06:54:22.128 UTC [49] LOG:  background worker "logical replication launcher" (PID 58) exited with exit code 1
2020-08-03 06:54:22.128 UTC [53] LOG:  shutting down
2020-08-03 06:54:22.154 UTC [49] LOG:  database system is shut down
166:M 03 Aug 2020 06:54:22.176 # User requested shutdown...
166:M 03 Aug 2020 06:54:22.176 * Saving the final RDB snapshot before exiting.
166:M 03 Aug 2020 06:54:22.184 * DB saved on disk
166:M 03 Aug 2020 06:54:22.184 # Redis is now ready to exit, bye bye...
sha256:7b8e9281c49ba3dc37e0743a765cddc13ab73aae5486bd30722c696c2e1443b1
ce327c6e37246e63331f03b07d64f4882efa68e88cb1516c6343a9dddbbd59df

+ /usr/bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=4 -e UNICORN_SIDEKIQS=1 -e RUBY_GLOBAL_METHOD_CACHE_SIZE=131072 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e LETSENCRYPT_DIR=/shared/letsencrypt -e DISCOURSE_HOSTNAME=uat.xxxxxx.com -e DISCOURSE_DEVELOPER_EMAILS=support@xxxxxx.com -e DISCOURSE_SMTP_ADDRESS=smtp-relay.xxxxxx.com -e DISCOURSE_SMTP_PORT=587 -e DISCOURSE_SMTP_USER_NAME=support@xxxxxx.com -e DISCOURSE_SMTP_PASSWORD=support@xxxxxx.com -e LETSENCRYPT_ACCOUNT_EMAIL=me@example.com -h uat-app -e DOCKER_HOST_IP=172.17.0.1 --name app -t -p 80:80 -p 443:443 -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:fe:39:ba:65:e1 local_discourse/app /sbin/boot
44c604ccbda4bfb4d48722e1cbbf70e3b067531acda41175f6bdaaa013cc6d18

Confirm Image Is Built And Docker Is Running

[root@uat discourse]# docker container ls -a
CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS              PORTS                                      NAMES
44c604ccbda4        local_discourse/app   "/sbin/boot"        7 minutes ago       Up 7 minutes        0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   app
[root@uat discourse]# docker ps
CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS              PORTS                                      NAMES
44c604ccbda4        local_discourse/app   "/sbin/boot"        7 minutes ago       Up 7 minutes        0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   app
[root@uat discourse]# docker image ls
REPOSITORY            TAG                 IMAGE ID            CREATED             SIZE
local_discourse/app   latest              7b8e9281c49b        8 minutes ago       2.66GB
discourse/base        2.0.20200724-1815   6ba1506bf822        9 days ago          2.38GB
centos                latest              831691599b88        6 weeks ago         215MB
alpine                latest              a24bb4013296        2 months ago        5.57MB

Confirm Nothing Is Still Running On Host And Docker Is Listening On Ports 80 and 443

[root@uat discourse]# systemctl status nginx
Unit nginx.service could not be found.
[root@uat discourse]#
[root@uat discourse]# lsof -i TCP
COMMAND     PID    USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
systemd       1    root  181u  IPv4   19283      0t0  TCP *:sunrpc (LISTEN)
systemd       1    root  183u  IPv6   19285      0t0  TCP *:sunrpc (LISTEN)
rpcbind    1070     rpc    4u  IPv4   19283      0t0  TCP *:sunrpc (LISTEN)
rpcbind    1070     rpc    6u  IPv6   19285      0t0  TCP *:sunrpc (LISTEN)
sshd       1234    root    5u  IPv4   26444      0t0  TCP *:ssh (LISTEN)
sshd       1234    root    7u  IPv6   26446      0t0  TCP *:ssh (LISTEN)
cupsd      1240    root    9u  IPv6   27746      0t0  TCP localhost:ipp (LISTEN)
cupsd      1240    root   10u  IPv4   27747      0t0  TCP localhost:ipp (LISTEN)
dnsmasq    2094 dnsmasq    6u  IPv4   37419      0t0  TCP uat:domain (LISTEN)
sshd       7102    root    5u  IPv4 2072827      0t0  TCP uat:ssh->10.1.136.4:52229 (ESTABLISHED)
sshd       7156    tech    5u  IPv4 2072827      0t0  TCP uat:ssh->10.1.136.4:52229 (ESTABLISHED)
docker-pr 12991    root    4u  IPv6 2242261      0t0  TCP *:https (LISTEN)
docker-pr 13003    root    4u  IPv6 2242288      0t0  TCP *:http (LISTEN)

Restart Docker And Check Nginx After Nginx Has Stopped

[root@uat discourse]# ./launcher stop app
+ /usr/bin/docker stop -t 10 app
app
[root@uat discourse]# ./launcher start app; ./launcher enter app

starting up existing container
+ /usr/bin/docker start app
app
root@uat-app:/var/www/discourse# date; ps -ef | grep nginx
Mon 03 Aug 2020 07:29:47 AM UTC
root        34     1  0 07:29 ?        00:00:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/letsencrypt.conf
www-data    36    34  0 07:29 ?        00:00:00 nginx: worker process
www-data    37    34  0 07:29 ?        00:00:00 nginx: worker process
root      1091   398  0 07:29 pts/1    00:00:00 grep nginx
root@uat-app:/var/www/discourse# date; ps -ef | grep nginx
Mon 03 Aug 2020 07:29:50 AM UTC
root        34     1  0 07:29 ?        00:00:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/letsencrypt.conf
www-data    36    34  0 07:29 ?        00:00:00 nginx: worker process
www-data    37    34  0 07:29 ?        00:00:00 nginx: worker process
root      1854   398  0 07:29 pts/1    00:00:00 grep nginx
root@uat-app:/var/www/discourse# date; ps -ef | grep nginx
Mon 03 Aug 2020 07:29:52 AM UTC
root      2043  2038  0 07:29 ?        00:00:00 runsv nginx
root      2080   398  0 07:29 pts/1    00:00:00 grep nginx
root@uat-app:/var/www/discourse#
1 Like

Hi @titusc

Thanks for the great post and comprehensive troubleshooting info. Very well done.

Did you check the nginx log files in the app for clues?

Something is broken in your setup. My guess is that whatever is broken that keeps ubuntu from installing is also interfering with Centos.

You need to fix that before you install discourse (or likely, anything else).