Installing on Kubernetes

I’d like to install Discourse on our Kubernetes cluster. Kubernetes does not use the docker command to deploy images, so I’d need to build a Discourse image locally, push it to our private Docker registry, and then have Kubernetes pull the image from there. This is how we deploy other Docker-based software.

Can I do this by using ./launcher bootstrap locally, then pushing the built image to our registry? Are there likely to be any problems from this approach? How would upgrading work in this case? (Kubernetes container instances can be destroyed at any time, so to upgrade, we’d want to build a new image and re-deploy, which Kubernetes lets us do without downtime.)

I understand I’ll need to provide settings for database (via environment variables), volume mounts for storage, etc. We plan to use our existing redundant Redis, PostgreSQL and GlusterFS services for this, like we do with other Docker applications.

10 Likes

Yes, that is what we do in our infrastructure, build image, re-tag it, push to our private repo, then pull from our private repo to deploy.

Note on GlusterFS, we used to use it but had some extreme fail cases that lead us down a windy road of trying out Ceph (which also had issues) and finally moving to NetApp which has been rock solid.

4 Likes

Thanks, Sam. How do you handle upgrades? I noticed the Discourse version is hardcoded in the launcher script; manually updating it there feels like the wrong way?

Re Gluster: Yes, I’m not terribly happy with it; I’m thinking of moving back to NFS with DRBD and Pacemaker. We aren’t at the scale we can justify the expensive of dedicated storage hardware yet.

There are a bunch of tricks, you can always specify a custom base image in you yaml file eg:

Eg: https://github.com/SamSaffron/message_bus/blob/master/examples/chat/docker_container/chat.yml

But you should not really need to do that, for upgrades simply bootstrap a new image and push that our, you can add hooks to apt get upgrade and so on.

Okay, I think I’m missing something about how the launcher script (or pups) works here.

I have one YAML file for each instance, which looks like this:

discourse@hawthorn:~/discourse$ cat containers/clicsargent.yml 
templates:
  - "templates/torchbox-web.template.yml"
  - "private/clicsargent.template.yml"

env:
  DISCOURSE_HOSTNAME: 'clicsargent-stage-discourse.torchboxapps.com'
  DISCOURSE_DB_USERNAME: clicsargent_discourse
  DISCOURSE_DB_NAME: clicsargent_discourse

torchbox-web.template.yml sets the global database configuration:

discourse@hawthorn:~/discourse$ cat templates/torchbox-web.template.yml 
# This is a Discourse build for Torchbox servers on Kubernetes.

templates:
  - "templates/web.template.yml"
  - "templates/web.ratelimited.template.yml"

env:
  LANG: en_US.UTF-8
  UNICORN_WORKERS: 1
  DISCOURSE_DEVELOPER_EMAILS: 'sysadmin@torchbox.com'
  DISCOURSE_DB_SOCKET: ''
  DISCOURSE_DB_HOST: postgres-2.itl.rslon.torchbox.net
  DISCOURSE_REDIS_HOST: redis-1-cache.itl.rslon.torchbox.net
  DISCOURSE_REDIS_DB: '9'
  DISCOURSE_REDIS_PORT: '6381'
  DISCOURSE_SMTP_ADDRESS: mailer.itl.svc.torchbox.net

Then I set a database password in a separate file, so it doesn’t need to go in the Git repository:

discourse@hawthorn:~/discourse$ cat private/clicsargent.template.yml 
env:
  DISCOURSE_DB_PASSWORD: <password>

However, running the build doesn’t seem to pick up the correct database settings from the template

discourse@hawthorn:~/discourse$ ./launcher bootstrap clicsargent
[...]
 Failed to initialize site default
rake aborted!
PG::ConnectionBad: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

If I move all the database settings except password to containers/<instance>.yml, then it does work, and it correctly picks up the password from private/<instance>.yml.

What am I missing here? My understanding was that all the env settings would be merged from templates.

Sounds like a pups bug / problem, so that’s where you’ll want to focus your debug efforts.

Yeah @tgxworld mentioned this to me in the past… it is a launcher bug, it does not do 2 levels of inheritance, only 1.

I want to get it fixed, maybe try to patch that bash file up to support it

Have you considered using Docker Hub ? It seems with that container you can just supply env vars and it’s not required to run a launcher or your own images… I think… :slight_smile:

How amazing would a Kubernetes Helm Chart, or even an Operator for Discourse be… :open_mouth:

2 Likes

Sure, please see

There are a lot of details there.

Tried bitnami/discourse:latest (1.8.3-dirty as of date) and having a few issues with sidekiq
https://github.com/bitnami/bitnami-docker-discourse/issues/25

And there’s probably no way for me to fix it myself as bitnami’s packaging is all a black-box.

I bootstrap with the following in containers/app.yml:

templates:
  - "templates/web.template.yml"
  - "templates/redis.template.yml"

env:
  LANG: en_US.UTF-8
  DISCOURSE_HOSTNAME: 'discourse.mysite.org'
  DISCOURSE_DB_USERNAME: discourseuser
  DISCOURSE_DB_NAME: discourse
  DISCOURSE_DEVELOPER_EMAILS: 'jasmin.hassan@mysite.org'
  DISCOURSE_DB_SOCKET: ''
  DISCOURSE_DB_HOST: 'x.x.x.rds.amazonaws.com'
  DISCOURSE_SMTP_ADDRESS: 'email-smtp.x.amazonaws.com'
  DISCOURSE_SMTP_PORT: 587
  DISCOURSE_SMTP_USER_NAME: randomuser
  DISCOURSE_SMTP_PASSWORD: randompassword
  DISCOURSE_DB_PASSWORD: securepassword

Then I push to private docker hub repo, and write a yaml file for kubernetes to pull my newly pushed private image, and apply it. However, without a “command” and/or “args” set in the kubernetes yaml file for the deployment, the container/pod starts up but immediately errors with:

I, [2017-07-15T12:52:21.697829 #13] INFO – : Loading --stdin
/pups/lib/pups/config.rb:23:in initialize': undefined method ’ for nil:NilClass (NoMethodError)
from /pups/lib/pups/cli.rb:27:in new' from /pups/lib/pups/cli.rb:27:in run’
from /pups/bin/pups:8:in `’

After some research and digging in, I realize I have to set a custom command in kubernetes yaml file, so a part of it might look like:

- image: myorg/discourse:latest
  name: discourse
  command: ["/bin/bash"]
  args: ["-c", "cd /var/www/discourse && bin/bundle exec rails server && bin/bundle exec sidekiq -q critical,low,default -d -l log/sidekiq.log && nginx"]
  imagePullPolicy: Always
  ports:
  - containerPort: 80
  resources:
    requests:
      memory: "2Gi"

Then the ENV vars (postgres, redis, smtp, etc.) and volume mounts.

However, puma server (tcp/3000) dies silently after daemonizing according to the logs.
Fix (from containers/app.yml):
sed -i 's#/home/discourse/discourse#/var/www/discourse#' config/puma.rb

Site then loads, but all assets are not loaded (css, js, etc).
Fix:
sed -i 's/GlobalSetting.serve_static_assets/true/' config/environments/production.rb

so basically I ended up with the additional section in containers/app.yml:

run:
  - exec:
      cd: /var/www/discourse
      cmd:
        - sed -i 's#/home/discourse/discourse#/var/www/discourse#' config/puma.rb
        - sed -i 's/GlobalSetting.serve_static_assets/true/' config/environments/production.rb
        - bash -c "touch -a /shared/log/rails/{sidekiq,puma.err,puma}.log"
        - bash -c "ln -s /shared/log/rails/{sidekiq,puma.err,puma}.log log/"

additionally, because of SSL termination outside, for actions like trigger/delete/retry/etc at https://discourse/sidekiq one gets 403 forbidden errors and in puma.err.log it complains about HttpOrigin.
So I just fix that by adding:

    - sed -i 's/default \$scheme;/default https;/' /etc/nginx/conf.d/discourse.conf

and rebuilding.

latest build as of today
v1.9.0.beta4 +61

We usually use NGINX+Unicorn as our minimal deployment unit and boot to runit. This should run find with minimal hacking. I there a reason why you are trying to decompose this further?

Also I would definitely decompose redis in this kind of setup. Mixing it with the app makes it very hard to scale.

2 Likes

Hi Sam,

Thanks for replying. No, I have no particular reason to decompose, except that Kubernetes expects the entrypoint process to not exit and monitors that process and if it dies it restarts the container. Ideally I would like to run supervisord to spawn and monitor the main processes.
So you’re saying I can just supply /sbin/boot as the command for the container to run?

Redis runs in Kubernetes and is only accessible as a service, so because I cannot connect to it during manual bootstrapping process as I’m not yet inside that kube cluster, the bootstrap fails unless I give it a valid redis server. Therefore, I temporarily bootstrap with redis as well, but later override the redis env vars when running container in kubernetes. A mere workaround.

Also is there a reason against linking $home/plugins is to /shared/plugins?

1 Like

Anyone done this in Azure Container Service (AKS)?

We deploy on AWS using the AWS registry, I see no reason why you could not deploy from AKS.

@Jasmin_Hassan / @felicity what ended up happening here?

2 Likes

In whichever environment we need discourse (example Staging or Production), we have to bootstrap discourse on one of the cluster’s nodes (AWS cluster created with KOPS), after having upgraded docker on the nodes to satisfy requirement for bootstrapping discourse image.
We bootstrap on one of the cluster’s nodes so that we’re in the VPC with access to pre-existing RDS Database (Postgres) and Elasticache (Redis).

I start with copying samples/web_only.yml to containers/discourse.yml

And I have to add SSL + Letsencrypt because if I disable force_https and make ELB http only origin policy, Google oauth in discourse will load over http, and discourse will mix loading assets over http/https which causes problems (ex: preview post stops working with errors in browser console about assets loaded over insecure http while connection is over https), having configured Cloudfront to redirect HTTP to HTTPS as the viewer policy.

templates:
  - "templates/web.template.yml"
  - "templates/web.ratelimited.template.yml"
## Uncomment these two lines if you wish to add Lets Encrypt (https)
  - "templates/web.ssl.template.yml"
  - "templates/web.letsencrypt.ssl.template.yml"

Then define all vars needed in the “env” section, such as:
DISCOURSE_SMTP_*
DISCOURSE_DB_* (only leave _SOCKET blank since we’re using remote RDS DB)
DISCOURSE_REDIS_HOST

Also if your RDS database is newer version, say 9.6.x, discourse backups will not work because of older pg_dump version in the image, so I also edit the template to include (at the end):

run:
  - exec: apt-get -y purge postgresql-client-9.5 postgresql-9.5 postgresql-contrib-9.5
  - exec: apt-get -y install postgresql-client-9.6

Then the procedure to bootstrap with the above template (discourse.yml) and push to docker hub is easy:

# rm -rf /var/discourse

# git clone https://github.com/discourse/discourse_docker.git /var/discourse

# cd /var/discourse

# cp /home/admin/discourse/discourse.yml containers/

# ./launcher destroy discourse

# ./launcher bootstrap discourse

# docker tag $(docker images local_discourse/discourse -q) myorg/discourse:PROD-20171228

# docker login

# docker push myorg/discourse:PROD-20171228

If using EBS for the /shared volume, you’ll need to first make PV & PVC for a ebs volume:

kubectl apply -f discourse_ebs.yml

apiVersion: v1
items:
- apiVersion: v1
  kind: PersistentVolume
  metadata:
    name: ebs-discourse
  spec:
    capacity:
      storage: 100Gi
    accessModes:
      - ReadWriteOnce
    storageClassName: ebs-discourse
    awsElasticBlockStore:
      volumeID: vol-xxxxxxxxxxxxxxx
      fsType: ext4
- apiVersion: v1
  kind: PersistentVolumeClaim
  metadata:
    name: ebs-discourse
  spec:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 100Gi
    storageClassName: ebs-discourse
kind: List
metadata: {}

Finally we can deploy, using a yaml similar to the following:

apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    name: discourse
  spec:
    ports:
    - name: discourse-http
      port: 80
      targetPort: 80
    - name: discourse-https
      port: 443
      targetPort: 443
    selector:
      app: discourse
    type: LoadBalancer
    selector:
      app: discourse
    type: LoadBalancer
- apiVersion: extensions/v1beta1
  kind: Deployment
  metadata:
    name: discourse
    labels:
      app: discourse
  spec:
    replicas: 1
    strategy:
      type: Recreate
    template:
      metadata:
        labels:
          app: discourse
      spec:
        containers:
        - image: myorg/discourse:PROD-20171228
          name: discourse
          command: ["/sbin/boot"]
          imagePullPolicy: Always
          ports:
          - containerPort: 80
          resources:
            requests:
              memory: "1Gi"
          env:
            - name: DISCOURSE_HOSTNAME
              value: forum.myorg.ngo
            - name: DISCOURSE_SITE_NAME
              value: 'MyOrg Forum'
            - name: DISCOURSE_DEVELOPER_EMAILS
              value: 'bla@myorg.ngo,bla2@myorg.ngo'
            - name: DISCOURSE_DB_HOST
              value: production.xxxxxxxxx.eu-central-1.rds.amazonaws.com
            - name: DISCOURSE_DB_SOCKET
              value: ''
            - name: DISCOURSE_DB_NAME
              value: 'discourse'
            - name: DISCOURSE_DB_USERNAME
              value: discoursedbuser
            - name: DISCOURSE_DB_PASSWORD
              value: LongSecurePasswordHere
            - name: DISCOURSE_REDIS_HOST
              value: redis-production.xxxxxx.xx.0001.euc1.cache.amazonaws.com
            - name: DISCOURSE_REDIS_PORT
              value: '6379'
            - name: DISCOURSE_SMTP_ADDRESS
              value: 'email-smtp.eu-west-1.amazonaws.com'
            - name: DISCOURSE_SMTP_PORT
              value: '587'
            - name: DISCOURSE_SMTP_USER_NAME
              value: xxxxxxxxxxxxxx
            - name: DISCOURSE_SMTP_PASSWORD
              value: xxxxxxxxxxxxxx
          volumeMounts:
          - mountPath: /shared
            name: discourse-discourse-data
        restartPolicy: Always
        volumes:
        - name: discourse-discourse-data
          persistentVolumeClaim:
            claimName: ebs-discourse
        imagePullSecrets:
        - name: services-secret
kind: List
metadata: {}

Finally go into the pod (kubectl exec -ti discourse-xxxx bash), run rails c, and at the prompt run SiteSetting.force_https = true and finally exit

Now you have HTTPS-ready ELB , so you can create a CloudFront distribution with origin as that ELB using HTTPS as the origin policy, and “Redirect HTTP to HTTPS” for the viewer policy.

Important notes:
You cannot build in one environment (ex: staging) and deploy in another (ex: production). Letsencrypt template builds into the image the discourse_hostname & letsencrypt_email used at build time (which you defined in containers/discourse.yml)

Cheers

8 Likes

I just came across this post on medium, in case anyone finds it helpful

17 Likes

I tried to follow the steps in the post above, but I hit this error when I ran ./launcher bootstrap web_only

I, [2018-03-01T04:21:53.653880 #15]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate'
Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 2 Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) subscribe failed, reconnecting in 1 second. Call stack ["/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis/client.rb:345:in `rescue in establish_connection'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis/client.rb:331:in `establish_connection'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis/client.rb:101:in `block in connect'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis/client.rb:293:in `with_reconnect'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis/client.rb:100:in `connect'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis/client.rb:276:in `with_socket_timeout'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis/client.rb:133:in `call_loop'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis/subscribe.rb:43:in `subscription'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis/subscribe.rb:12:in `subscribe'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis.rb:2775:in `_subscription'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis.rb:2143:in `block in subscribe'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis.rb:58:in `block in synchronize'", "/usr/local/lib/ruby/2.4.0/monitor.rb:214:in `mon_synchronize'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis.rb:58:in `synchronize'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis.rb:2142:in `subscribe'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/message_bus-2.1.2/lib/message_bus/backends/redis.rb:336:in `global_subscribe'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/message_bus-2.1.2/lib/message_bus.rb:525:in `global_subscribe_thread'", "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/message_bus-2.1.2/lib/message_bus.rb:473:in `block in new_subscriber_thread'"] 
rake aborted!
PG::ConnectionBad: could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5432?
could not connect to server: Cannot assign requested address
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5432?

web_only.yaml env:

LANG: en_US.UTF-8
UNICORN_WORKERS: 2
DISCOURSE_DB_USERNAME: dev
DISCOURSE_DB_PASSWORD: discourse
DISCOURSE_DB_HOST: localhost
DISCOURSE_DB_NAME: discourse_development
DISCOURSE_DEVELOPER_EMAILS: 'foobar@gmail.com'
DISCOURSE_HOSTNAME: 'localhost'
DISCOURSE_REDIS_HOST: localhost

Did I miss some configuration? The redis and postgress are running locally and I have tested the connection manually.

tcp        0      0 0.0.0.0:6379            0.0.0.0:*               LISTEN      4086/redis-server *
tcp        0      0 127.0.0.1:5432          0.0.0.0:*               LISTEN      1763/postgres   
1 Like

I thought bitnami is all open source. No?

It would be nice if some one has success install on Kubernetes can answer my questions.

  1. I can bootstrap a web_only image. During the process, I see db:migrate and assets:precompile. When I bootstrap it, it run against a temporary postgreSQL and Redis , does this mean I have to run it again on the production database?

  2. Is DISCOURSE_DB_HOST and DISCOURSE_REDIS_HOST written to the image? If not, how can I change it on the image?

I am trying to use a CI service to build the image and push it to a private docker registry. It would make no sense if the CI service run against the production database.

The free code camp tutorial does not answer my question and I cannot find a answer anywhere.

2 Likes