Troubleshooting an instance which appears to be running but inaccessible


(Stephen) #1

I’ve tried to rebuild our app today to incorporate the new OAuth plugin, since which time Discourse appears to be running, but the site remains inaccessible.

Commenting out or removing the line for the plugin doesn’t help, checking syslog I see:

Oct 30 14:24:06 discourse redis[49]: 10 changes in 300 seconds. Saving...
Oct 30 14:24:06 discourse redis[49]: Background saving started by pid 1050
Oct 30 14:24:21 discourse redis[1050]: DB saved on disk
Oct 30 14:24:21 discourse redis[1050]: RDB: 19 MB of memory used by copy-on-write
Oct 30 14:24:21 discourse redis[49]: Background saving terminated with success
Oct 30 14:29:22 discourse redis[49]: 10 changes in 300 seconds. Saving...
Oct 30 14:29:22 discourse redis[49]: Background saving started by pid 1986
Oct 30 14:29:36 discourse redis[1986]: DB saved on disk
Oct 30 14:29:36 discourse redis[1986]: RDB: 15 MB of memory used by copy-on-write
Oct 30 14:29:36 discourse redis[49]: Background saving terminated with success

Which suggests that Discourse is there, just not responding by the usual FQDN.

Is there something up with the current release that would prevent rebuilds, or could manifest in this way?

It’s a DigitalOcean install, as per the install guide. Has been running for ~12 months without significant issue.


(Stephen) #2

After further investigation, I can see that there are connection issues with redis in my production.log:

Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) subscribe failed, reconnecting in 1 second. Call stack ["/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:331:in `rescue in establish_connection'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:317:in `establish_connection'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:94:in `block in connect'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:279:in `with_reconnect'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:93:in `connect'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:350:in `ensure_connected'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:207:in `block in process'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:292:in `logging'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:206:in `process'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:126:in `block in call_loop'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:266:in `with_socket_timeout'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:273:in `without_socket_timeout'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/client.rb:125:in `call_loop'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/subscribe.rb:35:in `subscription'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis/subscribe.rb:12:in `subscribe'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis.rb:2622:in `_subscription'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis.rb:2035:in `block in subscribe'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis.rb:37:in `block in synchronize'", "/usr/local/lib/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis.rb:37:in `synchronize'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/redis-3.2.1/lib/redis.rb:2034:in `subscribe'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/message_bus-1.0.16/lib/message_bus/redis/reliable_pub_sub.rb:302:in `global_subscribe'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/message_bus-1.0.16/lib/message_bus.rb:457:in `global_subscribe_thread'", "/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/message_bus-1.0.16/lib/message_bus.rb:409:in `block in new_subscriber_thread'"]

Job exception: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED)

The launcher logs also show:

33:M 30 Oct 15:40:54.117 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
33:M 30 Oct 15:40:54.117 # Server started, Redis version 3.0.4
33:M 30 Oct 15:40:54.117 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
33:M 30 Oct 15:40:54.117 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
rsyslogd: invalid or yet-unknown config file command 'KLogPermitNonKernelFacility' - have you forgotten to load a module? [try http://www.rsyslog.com/e/3003 ]
rsyslogd: warning: ~ action is deprecated, consider using the 'stop' statement instead [try http://www.rsyslog.com/e/2307 ]
rsyslogd: Could no open output pipe '/dev/xconsole': No such file or directory [try http://www.rsyslog.com/e/2039 ]
2015-10-30 15:40:54 UTC [49-1] LOG:  database system was interrupted; last known up at 2015-10-30 15:38:56 UTC
2015-10-30 15:40:54 UTC [49-2] LOG:  database system was not properly shut down; automatic recovery in progress
2015-10-30 15:40:54 UTC [49-3] LOG:  record with zero length at B/42BD2288
2015-10-30 15:40:54 UTC [49-4] LOG:  redo is not required
2015-10-30 15:40:54 UTC [49-5] LOG:  MultiXact member wraparound protections are now enabled
2015-10-30 15:40:54 UTC [31-1] LOG:  database system is ready to accept connections
2015-10-30 15:40:54 UTC [53-1] LOG:  autovacuum launcher started
supervisor pid: 34 unicorn pid: 55
nginx: [emerg] invalid number of arguments in "set_real_ip_from" directive in /etc/nginx/conf.d/discourse.conf:100

Config is identical to the pre-rebuild state now, the only change was the reference to the plugin.

Any ideas?

Uncomfortable rolling back from tests-passed to the latest beta without a nod from @sam that this won’t cause additional complications. In the past it has been a factor.


(Stephen) #3

Ok, finally gotten to the bottom of this, responding for completeness as I’m assuming I won’t be alone this issue.

The culprit was the Cloudflare template, it was adding an invalid line of set_real_ip_from ; as shown below:

set_real_ip_from 103.21.244.0/22;
set_real_ip_from 103.22.200.0/22;
set_real_ip_from 103.31.4.0/22;
set_real_ip_from 104.16.0.0/12;
set_real_ip_from 108.162.192.0/18;
set_real_ip_from 141.101.64.0/18;
set_real_ip_from 162.158.0.0/15;
set_real_ip_from 172.64.0.0/13;
set_real_ip_from 173.245.48.0/20;
set_real_ip_from 188.114.96.0/20;
set_real_ip_from 190.93.240.0/20;
set_real_ip_from 197.234.240.0/22;
set_real_ip_from 198.41.128.0/17;
set_real_ip_from 199.27.128.0/21;
set_real_ip_from ;
set_real_ip_from 2400:cb00::/32;
set_real_ip_from 2405:8100::/32;
set_real_ip_from 2405:b500::/32;
set_real_ip_from 2606:4700::/32;
set_real_ip_from 2803:f800::/32;

to /etc/nginx/conf.d/discourse.conf. Presumably something has changed in their data. Commenting this line out has resolved our problems. cc @riking

In the interest of science I repeated the rebuild and once again the erroneous line snuck in, so it can be considered reproducible.


(Jeff Atwood) #4

Hmm @riking what’s the issue here? Is that an invalid IPV6 syntax?


(Kane York) #5

I think that the file used to not have a newline in it but now it does.

We need to remove the echo >> /tmp/cloudflare-ips line (or whatever it is)