Forum unavailable with redis error in unicorn logs (

Hello.

I’m hosting two forums on my machine, both are up to date (3.4.0.beta3-dev for one, and I can’t check the one that is unavailable)

One of them has been updated earlier this week and stopped working suddenly about 2 days ago.

Once logged in, a “Oops” message appears on every page.

I went into the container and looked at the unicorn logs and there seems to be a problem connecting with redis :

Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 3 Error fetching job: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED)
Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 3 Error fetching job: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED)
Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 3 Error fetching job: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED)
Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 3 Error fetching job: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED)
Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 3 Job exception: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) sidekiq-exception
Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 3 Job exception: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) sidekiq-exception
Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 3 Job exception: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) sidekiq-exception
Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 3 Job exception: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) sidekiq-exception
Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 3 Job exception: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) sidekiq-exception
Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 3 Job exception: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) sidekiq-exception
Failed to report error: Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) 2 Error connecting to Redis on localhost:6379 (Errno::ECONNREFUSED) subscribe failed, reconnecting in 1 second. Call stack /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:398:in `rescue in establish_connection'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:379:in `establish_connection'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:115:in `block in connect'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:344:in `with_reconnect'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:114:in `connect'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:409:in `ensure_connected'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:269:in `block in process'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:356:in `logging'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:268:in `process'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/client.rb:161:in `call'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/rack-mini-profiler-3.3.1/lib/mini_profiler/profiling_methods.rb:89:in `block in profile_method'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis.rb:270:in `block in send_command'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis.rb:269:in `synchronize'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis.rb:269:in `send_command'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/redis-4.8.1/lib/redis/commands/strings.rb:191:in `get'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/message_bus-4.3.8/lib/message_bus/backends/redis.rb:388:in `process_global_backlog'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/message_bus-4.3.8/lib/message_bus/backends/redis.rb:277:in `block in global_subscribe'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/message_bus-4.3.8/lib/message_bus/backends/redis.rb:289:in `global_subscribe'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/message_bus-4.3.8/lib/message_bus.rb:768:in `global_subscribe_thread'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/message_bus-4.3.8/lib/message_bus.rb:739:in `block in new_subscriber_thread'

Problem is I don’t see what’s wrong, as I’m able to connect to the redis server on the container via redis-cli and set and get keys just fine.

I see many similar issues on the forum but they’re either old or didn’t get any resolution. Anyone can help ? I can provide more info if needed.

1 Like

This used to work?

How long ago did you start?

You’ve got two versions of a standard app.yml file each with its own Redis and postgres template and different paths to the data?

One possibility is permissions. At some point the id of the user and group used s changed, but I’ve not seen that be a problem on single container setups.

Thank you for taking a look at my issue.

Yep, this used to work until about 3 days ago. However, this morning I tried to run a backup in CLI from within the container and it failed due to a weird postgres error, so I suspect the database is corrupted somehow. Which isn’t at all relevant to the error message I mentionned above, but I’m inclined to try to restore a working backup from 8 days ago (the forum owner is okay with that) to see if it solves everything.

I suppose I can restore a backup from an older version of Discourse to a newer one? (since there has been an update inbetween the backup and now.)

EDIT : to clarify, the two forums are different yaml files, so each has its own container to work with, and different data directories obviously.

Did it fail on the backup or the restore?

It would be helpful if you actually included the weird error,but if it was on the restore I suspect that it is the one described here: Restore failing with missing chat_mention function

If that’s correct, my advice is to wait. If waiting is not an option, you could try seeing that the site you restore has the same commit as the site that create’s the backup.

It’s on the backup.

pg_dump: error: Dumping the contents of table "posts" failed: PQgetResult() failed.
pg_dump: error: Error message from server: ERROR:  could not open file "base/16384/17044": No such file or directory

Hence why I said I will try to restore a backup first to see if it solves the issue. :slight_smile:

That does seem like a database problem.

Have you restored a filesystem backup before? That sounds like the kind of thing that would happen if you made a file system backup while the database was running. It’s one of the reasons that I dint recommend filesystem backups.

If you want to restore a discourse backup, which I’d usually recommend, you’d need to drop the database, and create and migrate an empty database before doing the restore.

I do filesystem backups but exclude postgres databases as I dump them instead. However I might have forgotten to exclude discourse’s folder, I’ll have to check that later on.

Are the information in this thread still valid for my use case?

The stuff about how to get the backup where you need it seems unnecessarily complicated.

Okay, I found the culprit, thanks for pointing me in the right direction about the filesystem. We run a virus scan and someone did upload something ont he forum that had a virus signature. As a result clamav removed the file. I’ll tune it better so it doesn’t quarantine postgres files anymore.

Sorry for the waste of time :slight_smile:

1 Like

Great! Glad my hint helped! I’d never have thought of that, but I haven’t run virus software for a very long time.