Urgent: Site down after upgrade to 1.6.4 due to Cloudflare template

Just ran into a bit of a problem.

I did routine updates to my site Tappara.co:

  • 1.6.2 -> 1.6.4 via git pull and ./launcher rebuild app
    • It went cleanly.
  • Installed a few updates available to Ubuntu 16.04 server
  • Rebooted

Now the site refuses to come back online. Now this stuff goes beyond my linux sysadmin skillz.

From what I can see, Docker is running and so does the app. I see relevant processes (ruby etc) starting, but basically idling. The site is reporting a 521 error. Tried re-rebuild a couple of times, but no luck. The installation is originally made with Discourse 1.5b in December, but switched to stable branch when it was released in April.

I would guess that http requests are not correctly routed, but I have not touched any settings or parameters of that field. This is basically a default installation. I do not have Let’s Encrypt or other certificate enabled.

  • Anyone seen similar recently?
  • Newbie friendly tips on how to start diagnosing and fixing?

There seems to be a flood of errors in nginx/error.log

2016/09/28 12:32:09 [emerg] 977#977: invalid number of arguments in "set_real_ip_from" directi$
1 Like

That’s odd. You might try rebuilding again.

Did that a number of times already.

what is your container config?

What do you mean? It’s a standard Discourse setup, according to your install guide. To the letter.

@sam

/# nginx -t
nginx: [emerg] invalid number of arguments in "set_real_ip_from" directive in /etc/nginx/conf.d/discourse.conf:56
nginx: configuration file /etc/nginx/nginx.conf test failed

Do you have a nginx running outside the container?

2 Likes

No apparently not. (And some filler here, thanks to minimum post length)

Since our nginx version is HARD pinned, I don’t know how would you get a broken version :thinking:

On your app.yml you are listening which posts?

## which TCP/IP ports should this container expose?
expose:
  - "80:80"   # fwd host port 80   to container port 80 (http)

.yml has not been modified in ages.

you have an app.yml file … what is the text in it :slight_smile:

My discourse.conf from a default as possible install doesn’t even have a set_real_ip on it :thinking:.

Your templates on app.yml start are those?

templates:
  - "templates/postgres.template.yml"
  - "templates/redis.template.yml"
  - "templates/web.template.yml"
  - "templates/web.ratelimited.template.yml"
## Uncomment these two lines if you wish to add Lets Encrypt (https)
  - "templates/web.ssl.template.yml"
  - "templates/web.letsencrypt.ssl.template.yml"

Aha, so you are probably using this: https://github.com/discourse/discourse_docker/blob/master/templates/cloudflare.template.yml

And it’s probably broken, since it’s 1 year old, and we updated nginx in the mean time, and don’t really use cloudfare.


I would disable this cloudfare thing, get everything backup, and then try again with time.

You will need to disable cloudfare magic in their console too.

1 Like
templates:
  - "templates/postgres.template.yml"
  - "templates/redis.template.yml"
  - "templates/web.template.yml"
  - "templates/sshd.template.yml"
  - "templates/web.ratelimited.template.yml"
  - "templates/cloudflare.template.yml"

Okay, so now we are talking. Yes indeed I use CloudFlare DNS. The .yml is from 1.5b.

1 Like

But @falco , what the heck was actually changed and why? This was a minor security update in the stable branch. CloudFlare is awesome in many ways – saves a ton of bandwidth, blocks bad bots etc.

Cloudflare changed the URL they list their IP ranges from. I’ve pushed out a fix; could you update your discourse_docker repo and try the rebuild again? If it doesn’t work, dump the full output of the rebuild command into a gist so I can take a look at what’s going wrong.

5 Likes

I use Cloudflare and the default template for my forum, it work pretty well, you don’t need the cloudlare template for the newer version :slight_smile:

EDIT : you need the template to avoid limitation registration on IP

Git pulled and rebuilding.

So the fail was then actually triggered by the rebuild, as the URL was changed (and not the actual 1.6.4 update)?

Are your user’s true IPs resolved, or do you just see CloudFlare’s IPs?

1 Like

The fail was triggered by the rebuild, while the root cause is Cloudflare changing the URL we need to retrieve to get the list of IP ranges that are Cloudflare’s own, for the purposes of setting the set_real_ip_from config parameter.

And we are back online - fix verified. Good job! :heart:

Improvement suggestion: Perhaps the rebuild should hard fail with a (human readable) error, if it is unable to fetch CloudFlares IP list?

1 Like