Redis connection issue with OpenVZ


(Raymond) #1

Hi, I am having a redis connection issue. I had discourse fully configured and running for about 30m and when making changes (adding users and giving them moderator) in the admin panel, I suddenly began getting 404s.

In the log, I see all 200s then 404s… about the same time in the production.log I see the server complaining about being unable to connect to redis.

> Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) subscribe failed, reconnecting in 1 second. Call stack ["/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis/client.rb:331:in `rescue in establish_connection'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis/client.rb:317:in `establish_connection'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis/client.rb:94:in `block in connect'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis/client.rb:279:in `with_reconnect'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis/client.rb:93:in `connect'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis/client.rb:350:in `ensure_connected'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis/client.rb:207:in `block in process'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis/client.rb:292:in `logging'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis/client.rb:206:in `process'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis/client.rb:112:in `call'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis.rb:789:in `block in get'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis.rb:37:in `block in synchronize'", "/home/discourse/.rvm/rubies/ruby-2.0.0-p643/lib/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis.rb:37:in `synchronize'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/redis-3.2.1/lib/redis.rb:788:in `get'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/message_bus-1.0.10/lib/message_bus/reliable_pub_sub.rb:241:in `process_global_backlog'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/message_bus-1.0.10/lib/message_bus/reliable_pub_sub.rb:276:in `block in global_subscribe'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/message_bus-1.0.10/lib/message_bus/reliable_pub_sub.rb:290:in `call'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/message_bus-1.0.10/lib/message_bus/reliable_pub_sub.rb:290:in `global_subscribe'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/message_bus-1.0.10/lib/message_bus.rb:375:in `global_subscribe_thread'", "/home/discourse/.rvm/gems/ruby-2.0.0-p643/gems/message_bus-1.0.10/lib/message_bus.rb:367:in `block in new_subscriber_thread'"]

When looking in /var/log/redis/redis.log I can see the redis server is fully functional, has 18 active connections confirmed with lsof to be coming from discourse user. I am also able to telnet to the redis port and get a connection. There is nothing else in any log I can find that may indicate an issue. all processes seem to be running as expected.

The logs above give very little to follow here… Any help is always appreciated. Even an idea how to proceed with a troubleshoot would be great.

[2nd edit]
i also get a proper return from redis-cli ping command PONG. confirms redis is working to me…

[RESOLUTION AFTER THOUGHTS:]
Originally, I used the default minimal openvz centos template with 500mb. The installer failed during the database population step after createdb command. This caused the pain because it slew the process halfway so when I saw content in the database after the command anyone automatically assumes with no errors everything worked. Not the case, install was mangled. Had to DROP DB and start that step over when I figured out what was happening. After that, I increased to 1gb to make it move forward. At 1gb it caused the redis failing issue this thread outlines. Slaying the redis thread but not the daemon. Another murky issue to resolve. Host is at 2048mb ram now and works fine but I suspect I will need :unlimited when there is content and users on the site because random slayings would not be very fun.

Thanks,
Ray


(Raymond) #2

I am answering my own question here. I assumed a redis issue because of the nonsense in the logs and to resolve the problem I repeated the bundle install --deployment command from the git directory.

After a reboot, everything appears to be working although I have a few concerns after reading about issues with the 2.6 kernel in openvz. At the moment my openvz discourse appears fully functional.


#3

I don’t expect discourse to ever work properly on openvz, or at least not have random processes slayed for using vswap or cache.


(Raymond) #4

I will keep it until it’s randomness becomes unexplainable for now after the PITA it was to install correctly.
As a comment to that, /proc/user_beancounters is a file that shows you why things are ‘slayed’. To correct this I added a few ‘:unlimited’ arguments to my /vz/conf/containerid.conf file. This appears to have stopped the slayings and looks like it may be stable. Will update in a few days if otherwise.
Ray

Sent from my Samsung Galaxy s5 Octacore device


#5

Ah, you own the host node. I was meaning that effectively openvz (as a
end-user, like buying a VPS) is pretty much horrendous to work with,
nothing ever working, needing to request loading of kernel modules and
veth (and basically no commercial provider will give veth access on
shared ovz host nodes since you can do some pretty terrible stuff with
it), and then having to try spoofing ovz kernel names or something.


(Raymond) #6

Yes, this appears to be the key to the openvz old kernel issue for those that wish to use that configuration. I’ve seen a lot of people stuck there. As an update, I hammered the server configuring it until early this morning and although there are no users and minimal content, the vz image has not grown past 1100mb steady. 1000cpu units and sometimes I see the processor spin a bit but is mostly still idle. I am mentioning this because with :unlimited you can obviously use the entire host so yeah renting a vps is probably not an option for openvz unless you rent a dedicated box. But theres the answer anyway.

Thanks for the help. Very much appreciated.

Ray


(Raymond) #7

Another comment about this that may still allow vps hosting, if you are failing on tcp_recv_buf memory size, your hosting provider would probably increase that for you. If you are failing on overall memory, probably not. This is a bit of a guess but considering the timing of my issue, the port remaining open and daemon not crashing indicates that it just had nowhere to put the reply. I made a bunch unlimited because I don’t care what it uses so I am not sure which :unlimited fixed the problem. Not everyone has the resources I guess so you’ll know for sure when you check your failures.

Check /proc/user_beancounters as your situation may be different from my own.