您好,社区
我们在 Kubernetes 中设置了 Discourse。我们正在运行一个包含 Discourse 容器镜像的部署,使用外部 Redis 和外部 PostgreSQL。
为了管理 Redis 高可用性,我们在由 Sentinel 管理的 StatefulSet 中部署了 3 个节点。此外,我们在 Discourse 和 Redis 之间添加了一个 HAProxy 部署,配置如下:
global
log stdout format raw local0
maxconn 102400
maxsessrate 500
maxconnrate 2000
maxsslrate 2000
resolvers mydns
parse-resolv-conf
hold valid 5s
accepted_payload_size 8192
defaults
log global
mode tcp
timeout client 2h
timeout connect 360s
timeout server 2h
default-server init-addr none
retries 3
frontend stats
bind *:1024
mode http
http-request use-service prometheus-exporter if { path /metrics }
stats enable
stats uri /stats
stats refresh 10s
listen redis_master
bind *:6379
mode tcp
option tcp-check
tcp-check connect
tcp-check send AUTH\ XXXXXXXXXXX\r\n
tcp-check expect string +OK
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send QUIT\r\n
tcp-check expect string +OK
server redis_node_0 discoursev2-redis-node-0.discoursev2-redis-headless.discoursev2.svc.cluster.local:6379 resolvers mydns check inter 1s fall 1 rise 1
server redis_node_1 discoursev2-redis-node-1.discoursev2-redis-headless.discoursev2.svc.cluster.local:6379 resolvers mydns check inter 1s fall 1 rise 1
server redis_node_2 discoursev2-redis-node-2.discoursev2-redis-headless.discoursev2.svc.cluster.local:6379 resolvers mydns check inter 1s fall 1 rise 1
listen redis_slave
bind *:6380
mode tcp
option tcp-check
tcp-check connect
tcp-check send AUTH\ cUvNwwd4ujTbqshTbYjeTTrX8tDlKAcYvvAUOGuk\r\n
tcp-check expect string +OK
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send QUIT\r\n
tcp-check expect string +OK
server redis_node_0 discoursev2-redis-node-0.discoursev2-redis-headless.discoursev2.svc.cluster.local:6379 resolvers mydns check inter 1s fall 1 rise 1
server redis_node_1 discoursev2-redis-node-1.discoursev2-redis-headless.discoursev2.svc.cluster.local:6379 resolvers mydns check inter 1s fall 1 rise 1
server redis_node_2 discoursev2-redis-node-2.discoursev2-redis-headless.discoursev2.svc.cluster.local:6379 resolvers mydns check inter 1s fall 1 rise 1
我们遇到的问题是,每次我们对 Discourse 部署进行滚动更新时,Unicorn 线程都无法连接到 Redis。它们会重试连接,过一段时间后,所有线程最终都能连接上。以下是我们日志中看到的一些错误:
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:398:in `rescue in establish_connection'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:684:in `init_worker_process'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:547:in `spawn_missing_workers'
config/unicorn.conf.rb:299:in `block in reload'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:721:in `worker_loop'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:379:in `establish_connection'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:115:in `block in connect'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:295:in `reconnect'
/var/www/discourse/lib/discourse.rb:906:in `after_fork'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/bin/unicorn:128:in `<top (required)>'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/bin/unicorn:25:in `<main>'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:344:in `with_reconnect'
E, [2024-07-24T15:42:22.524194 #531] ERROR -- : Error connecting to Redis on discoursev2-haproxy.discoursev2.svc.cluster.local:6379 (Socket::ResolutionError) (Redis::CannotConnectError)
/var/www/discourse/lib/discourse_redis.rb:217:in `reconnect'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/bin/unicorn:25:in `load'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:114:in `connect'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:143:in `start'
I, [2024-07-24T15:42:23.499254 #422] INFO -- : Loading Sidekiq in process id 422
2024-07-24 15:42:23 +0000: Starting Prometheus global reporter pid: 439
Booting Sidekiq 6.5.12 with Sidekiq::RedisConnection::RedisAdapter options {:host=>"discoursev2-haproxy.discoursev2.svc.cluster.local", :port=>6379, :password=>"REDACTED", :namespace=>"sidekiq"}
E, [2024-07-24T15:42:23.530765 #810] ERROR -- : Error connecting to Redis on discoursev2-haproxy.discoursev2.svc.cluster.local:6379 (Socket::ResolutionError) (Redis::CannotConnectError)
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:398:in `rescue in establish_connection'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:115:in `block in connect'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:684:in `init_worker_process'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:547:in `spawn_missing_workers'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/bin/unicorn:25:in `<main>'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:344:in `with_reconnect'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:295:in `reconnect'
config/unicorn.conf.rb:299:in `block in reload'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:379:in `establish_connection'
/var/www/discourse/lib/discourse.rb:906:in `after_fork'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/bin/unicorn:25:in `load'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/bin/unicorn:128:in `<top (required)>'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:114:in `connect'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:721:in `worker_loop'
/var/www/discourse/lib/discourse_redis.rb:217:in `reconnect'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:143:in `start'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:143:in `start'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:344:in `with_reconnect'
config/unicorn.conf.rb:299:in `block in reload'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:398:in `rescue in establish_connection'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:379:in `establish_connection'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:295:in `reconnect'
/var/www/discourse/lib/discourse_redis.rb:217:in `reconnect'
E, [2024-07-24T15:42:24.540440 #887] ERROR -- : Error connecting to Redis on discoursev2-haproxy.discoursev2.svc.cluster.local:6379 (Socket::ResolutionError) (Redis::CannotConnectError)
我们还尝试移除 HAProxy,直接指向 Redis 主节点,也观察到了同样的问题。此外,我们还禁用了 Discourse 和 Redis 之间的 Istio 网关。在这两种情况下,过一段时间后,所有 Unicorn 线程都能成功连接到 Redis。
有人能就这个问题提供一些指导吗?我们需要对 Redis 进行任何特殊的设置吗?这是我们当前的设置:
appendonly yes
appendfsync no
aof-rewrite-incremental-fsync yes
save ""
最后,我们想知道:
- Redis 高可用模式的推荐方法是什么?据我们所知,Redis 集群不受支持。
- 如果使用单节点 Redis 设置,在 Redis 丢失几秒钟的情况下会有什么后果?如果我们使用无状态 Redis,每次重启都会丢失所有信息,Discourse 会重新填充内容吗?
提前致谢。