Redis Problems? (Forum broken after upgrade)

I was prompted to upgrade today, and then required to do the build through launcher. Since then Discourse has been basically broken. Already logged in users can see the home page and select messages from there to display, but virtually all functions user and admin just hang with the throbber spinning. The production log is empty. I’ve tried stopping and restarting of course. Not even sure where to start looking. Ideas? Thanks!

UPDATE: As indicated in my various replies below, it appears that the root of the problem is that rdis is not accepting connections, and the rdis log extract below shows rdis apparently looping, each time starting with:

100 changes in 300 seconds. Saving…

and never reaching a “waiting for connection” state. Any suggestions on how to fix this issue would be much appreciated. Thank you.

1 Like

A previous log shows a variety of blocks, e.g.:

/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus/backends/redis.rb:274:in `block in global_subscribe'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus.rb:734:in `block in new_subscriber_thread'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:117:in `block in connect'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:269:in `block in process'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-2.3.3/lib/mini_profiler/profiling_methods.rb:85:in `block in profile_method'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis.rb:967:in `block in get'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis.rb:72:in `block in synchronize'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus/backends/redis.rb:274:in `block in global_subscribe'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus.rb:734:in `block in new_subscriber_thread'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:117:in `block in connect'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:269:in `block in process'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-2.3.3/lib/mini_profiler/profiling_methods.rb:85:in `block in profile_method'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis.rb:967:in `block in get'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis.rb:72:in `block in synchronize'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus/backends/redis.rb:274:in `block in global_subscribe'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus.rb:734:in `block in new_subscriber_thread'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:117:in `block in connect'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:269:in `block in process'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-2.3.3/lib/mini_profiler/profiling_methods.rb:85:in `block in profile_method'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis.rb:967:in `block in get'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis.rb:72:in `block in synchronize'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus/backends/redis.rb:274:in `block in global_subscribe'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus.rb:734:in `block in new_subscriber_thread'
1 Like

Full system reboot did not solve the problem. Situation exactly the same after full reboot.

1 Like

unicorn.stderr.log shows many errors relating to connecting to Redis, e.g.:

Failed to report error: Error connecting to Redis on localhost:6379 (Errno::EADDRNOTAVAIL)

1 Like

Here is the current redis log:

3112:C 23 Dec 2021 01:56:54.912 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
3112:C 23 Dec 2021 01:56:54.912 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=3112, just started
3112:C 23 Dec 2021 01:56:54.912 # Configuration loaded
3112:M 23 Dec 2021 01:56:54.914 * monotonic clock: POSIX clock_gettime
3112:M 23 Dec 2021 01:56:55.105 * Running mode=standalone, port=6379.
3112:M 23 Dec 2021 01:56:55.105 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
3112:M 23 Dec 2021 01:56:55.105 # Server initialized
3112:M 23 Dec 2021 01:56:55.105 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
3112:M 23 Dec 2021 01:56:55.401 * Loading RDB produced by version 6.2.6
3112:M 23 Dec 2021 01:56:55.401 * RDB age 296 seconds
3112:M 23 Dec 2021 01:56:55.401 * RDB memory usage when created 4.86 Mb
3112:M 23 Dec 2021 01:56:55.634 # Done loading RDB, keys loaded: 2751, keys expired: 2.
3112:M 23 Dec 2021 01:56:55.634 * DB loaded from disk: 0.528 seconds
3112:M 23 Dec 2021 01:56:55.634 * Ready to accept connections
3112:M 23 Dec 2021 02:01:55.037 * 100 changes in 300 seconds. Saving...
3112:M 23 Dec 2021 02:01:55.038 * Background saving started by pid 3523
3523:C 23 Dec 2021 02:01:55.140 * DB saved on disk
3523:C 23 Dec 2021 02:01:55.142 * RDB: 0 MB of memory used by copy-on-write
3112:M 23 Dec 2021 02:01:55.239 * Background saving terminated with success
3112:M 23 Dec 2021 02:06:56.065 * 100 changes in 300 seconds. Saving...
3112:M 23 Dec 2021 02:06:56.065 * Background saving started by pid 3831
3831:C 23 Dec 2021 02:06:56.134 * DB saved on disk
3831:C 23 Dec 2021 02:06:56.136 * RDB: 0 MB of memory used by copy-on-write
3112:M 23 Dec 2021 02:06:56.166 * Background saving terminated with success
3112:M 23 Dec 2021 02:11:57.071 * 100 changes in 300 seconds. Saving...
3112:M 23 Dec 2021 02:11:57.071 * Background saving started by pid 4138
4138:C 23 Dec 2021 02:11:57.135 * DB saved on disk
4138:C 23 Dec 2021 02:11:57.137 * RDB: 0 MB of memory used by copy-on-write
3112:M 23 Dec 2021 02:11:57.172 * Background saving terminated with success
3112:M 23 Dec 2021 02:16:58.000 * 100 changes in 300 seconds. Saving...
3112:M 23 Dec 2021 02:16:58.001 * Background saving started by pid 4448
4448:C 23 Dec 2021 02:16:58.064 * DB saved on disk
4448:C 23 Dec 2021 02:16:58.066 * RDB: 0 MB of memory used by copy-on-write
3112:M 23 Dec 2021 02:16:58.101 * Background saving terminated with success
1 Like

Rdis appears to be looping repeatedly with this and never reaching a state where it’s waiting for connections:

3112:M 23 Dec 2021 03:02:07.032 * 100 changes in 300 seconds. Saving...
3112:M 23 Dec 2021 03:02:07.033 * Background saving started by pid 7243
7243:C 23 Dec 2021 03:02:07.104 * DB saved on disk
7243:C 23 Dec 2021 03:02:07.105 * RDB: 0 MB of memory used by copy-on-write
3112:M 23 Dec 2021 03:02:07.134 * Background saving terminated with success
2 Likes

Might be worth checking memory:

free
vmstat 5 5 
uptime

While we’re at it, perhaps also check disk space
df -h /

And, if you don’t get any specific advice, another rebuild might be worth a try. Be sure to have a backup, and to have it downloaded too, just in case.

But really it feels like you need more specific advice.

2 Likes

Hi. I have a week’s worth or so of Discourse backups locally, though at this point I don’t know how to use them. Plenty of memory free (at least 1.5G) and free disk. I’ll try another rebuild now. Thanks.

1 Like

Rebuilt. Bottom line is that it’s still broken the same way, though there were some changes I saw along the way, During the build I noticed:

139:C 23 Dec 2021 16:40:32.625 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
139:C 23 Dec 2021 16:40:32.625 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=139, just started
139:C 23 Dec 2021 16:40:32.626 # Configuration loaded
139:M 23 Dec 2021 16:40:32.627 * monotonic clock: POSIX clock_gettime
139:M 23 Dec 2021 16:40:32.627 # Warning: Could not create server TCP listening socket *:6379: bind: Address already in use
139:M 23 Dec 2021 16:40:32.628 # Failed listening on port 6379 (TCP), aborting.

Later I saw this, which was similar to the looping I saw earlier. However, it says used 1MB of memory for copy on write, and the redis log looping had said 0MB.

112:M 23 Dec 2021 16:46:08.703 * Background saving started by pid 715
715:C 23 Dec 2021 16:46:08.822 * DB saved on disk
715:C 23 Dec 2021 16:46:08.824 * RDB: 1 MB of memory used by copy-on-write
112:M 23 Dec 2021 16:46:08.905 * Background saving terminated with success

The current redis log:

4161:M 23 Dec 2021 16:55:11.867 * Loading RDB produced by version 6.2.6
4161:M 23 Dec 2021 16:55:11.867 * RDB age 281 seconds
4161:M 23 Dec 2021 16:55:11.867 * RDB memory usage when created 4.61 Mb
4161:M 23 Dec 2021 16:55:12.086 # Done loading RDB, keys loaded: 2743, keys expired: 0.
4161:M 23 Dec 2021 16:55:12.086 * DB loaded from disk: 0.265 seconds
4161:M 23 Dec 2021 16:55:12.086 * Ready to accept connections
4161:M 23 Dec 2021 17:00:12.088 * 100 changes in 300 seconds. Saving...
4161:M 23 Dec 2021 17:00:12.098 * Background saving started by pid 4612
4612:C 23 Dec 2021 17:00:12.189 * DB saved on disk
4612:C 23 Dec 2021 17:00:12.191 * RDB: 0 MB of memory used by copy-on-write
4161:M 23 Dec 2021 17:00:12.199 * Background saving terminated with success

I am seeing that 0MB copy-on-write notice again but I don’t know if that’s normal or not. I cannot open a telnet connection on that machine to localhost 6379, I get the error:
Unable to connect to remote host: Cannot assign requested address

The production log shows lots of blocks as before:

/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:344:in `with_reconnect'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:116:in `connect'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:417:in `ensure_connected'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:269:in `block in process'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:356:in `logging'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:268:in `process'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis/client.rb:162:in `call'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-2.3.3/lib/mini_profiler/profiling_methods.rb:85:in `block in profile_method'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis.rb:967:in `block in get'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis.rb:72:in `block in synchronize'
/usr/local/lib/ruby/2.7.0/monitor.rb:202:in `synchronize'
/usr/local/lib/ruby/2.7.0/monitor.rb:202:in `mon_synchronize'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis.rb:72:in `synchronize'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/redis-4.5.1/lib/redis.rb:966:in `get'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus/backends/redis.rb:366:in `process_global_backlog'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus/backends/redis.rb:274:in `block in global_subscribe'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus/backends/redis.rb:287:in `global_subscribe'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus.rb:786:in `global_subscribe_thread'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/message_bus-3.3.8/lib/message_bus.rb:734:in `block in new_subscriber_thread'
Job exception: no address for meta.discourse.org (Resolv::ResolvError)
1 Like

That redis output is normal and it does not necessarily indicate that Redis is not listening. It’s a background save that always occurs every so-many minutes. It does not mean it’s “looping” instead of functioning. I think it’s actually a good sign that it is showing this output. So something else must be wrong. Maybe it’s on a different port, maybe it can’t be reached for some other reason.

2 Likes

Thanks very much for that. What would be other possibilities for all of these blocks being logged and commands just hanging?

1 Like

Looks like something else is already listening on that port. Consider a reboot.

1 Like

To see what process is listening on a port, try
sudo lsof -n -i -P | egrep LISTEN

1 Like

I can see from ps that redis is running and wants to listen to 6379

/usr/bin/redis-server *:6379

but lsof doesn’t show anything actually listening on that port.

1 Like

For reference, this is redis starting in the middle of my most recent rebuild:

111:C 22 Dec 2021 20:36:16.773 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
111:C 22 Dec 2021 20:36:16.773 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=111, just started
111:C 22 Dec 2021 20:36:16.773 # Configuration loaded
111:M 22 Dec 2021 20:36:16.774 * monotonic clock: POSIX clock_gettime
111:M 22 Dec 2021 20:36:16.775 * Running mode=standalone, port=6379.
111:M 22 Dec 2021 20:36:16.776 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
111:M 22 Dec 2021 20:36:16.776 # Server initialized
111:M 22 Dec 2021 20:36:16.777 * Loading RDB produced by version 6.2.1
111:M 22 Dec 2021 20:36:16.777 * RDB age 22 seconds
111:M 22 Dec 2021 20:36:16.777 * RDB memory usage when created 19.89 Mb
111:M 22 Dec 2021 20:36:16.823 # Done loading RDB, keys loaded: 4451, keys expired: 9.
111:M 22 Dec 2021 20:36:16.823 * DB loaded from disk: 0.047 seconds
111:M 22 Dec 2021 20:36:16.823 * Ready to accept connections

slightly later I see

I, [2021-12-22T20:36:27.079511 #1]  INFO -- : > cd /var/www/discourse && git reset --hard
139:C 22 Dec 2021 20:36:27.084 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
139:C 22 Dec 2021 20:36:27.086 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=139, just started
139:C 22 Dec 2021 20:36:27.086 # Configuration loaded
139:M 22 Dec 2021 20:36:27.087 * monotonic clock: POSIX clock_gettime
139:M 22 Dec 2021 20:36:27.087 # Warning: Could not create server TCP listening socket *:6379: bind: Address already in use
139:M 22 Dec 2021 20:36:27.087 # Failed listening on port 6379 (TCP), aborting.
Checking out files: 100% (26188/26188), done.
I, [2021-12-22T20:36:30.153161 #1]  INFO -- : HEAD is now at eb82849c FIX: none and all categories filter (#14999)
1 Like

Yeah, that looks about like what I see from this morning’s build. But nothing is actually listening on that port. It’s bizarre.

1 Like

Oddly, my setup is running fine, I have

root     26269  0.0  0.0   2160     0 ?        Ss   Dec22   0:00 runsv redis
root     26277  0.0  0.0   2304    24 ?        S    Dec22   0:00 svlogd /var/log/redis
uuidd    26278  0.2  0.4  83816  4088 ?        Sl   Dec22   3:31 /usr/bin/redis-server *:6379

and yet I too don’t see (from lsof, inside or outside the container) anything listening on that port.

1 Like

That’s sort of encouraging. And sort of not …

1 Like

My log shows a clutch of entries every 5 mins. Perhaps this is normal, and your forum troubles are not related to redis??

$ less /var/discourse/shared/standalone/log/var-log/redis/current
23478:C 23 Dec 2021 18:47:35.233 * DB saved on disk
23478:C 23 Dec 2021 18:47:35.234 * RDB: 1 MB of memory used by copy-on-write
52:M 23 Dec 2021 18:47:35.290 * Background saving terminated with success
52:M 23 Dec 2021 18:52:36.033 * 100 changes in 300 seconds. Saving...
52:M 23 Dec 2021 18:52:36.034 * Background saving started by pid 23804
23804:C 23 Dec 2021 18:52:36.177 * DB saved on disk
23804:C 23 Dec 2021 18:52:36.178 * RDB: 1 MB of memory used by copy-on-write
52:M 23 Dec 2021 18:52:36.235 * Background saving terminated with success
52:M 23 Dec 2021 18:57:37.085 * 100 changes in 300 seconds. Saving...
52:M 23 Dec 2021 18:57:37.086 * Background saving started by pid 24137
24137:C 23 Dec 2021 18:57:37.221 * DB saved on disk
24137:C 23 Dec 2021 18:57:37.222 * RDB: 1 MB of memory used by copy-on-write
52:M 23 Dec 2021 18:57:37.287 * Background saving terminated with success
52:M 23 Dec 2021 19:02:38.033 * 100 changes in 300 seconds. Saving...
52:M 23 Dec 2021 19:02:38.034 * Background saving started by pid 24466
1 Like

Thanks. Though this suggests that I’m back to ground zero with a broken forum … Ouch.

1 Like