Sidekiq jobs failing -- uninitialized class variable @@db_spec_cache

(Joseph Method) #1

Seeing this error in Sidekiq jobs that I find hanging out in the Retries group. The problem is intermittent; I can retry the jobs and they will all process after a few attempts. This problem can hold up signup emails though. This is for a multi site installation using the Docker install and most recent Discourse.

NameError: uninitialized class variable @@db_spec_cache in RailsMultisite::ConnectionManagement

I don’t see a way to get a more detailed backtrace for this. Looking in the code, this is the only place where @@db_spec_cache is referenced without checking if it’s defined first: but I didn’t see how this would be called before @@db_spec_cache was set.

(Jeff Atwood) #2

Any ideas on this @sam?

(Sam Saffron) #3

Recategorized as bug, will add some protection

(Sam Saffron) #4

Do you have a full backtrace? should be in /logs

(Joseph Method) #5

I don’t see any of these errors in /logs or in the logs on the file system. Is there supposed to be a sidekiq.log file?

I doubt this is significant but I had to connect to the database by setting PGHOST and PGPASSWORD environment variables, as described here:

(Joseph Method) #6

I’m seeing this on a digest email job for the main site, possibly related. This error is persistent. After clicking retry it comes back after 5 seconds. Other emails are intermittent.


{"type"=>"digest", "user_id"=>11, "current_site_id"=>"default"}

ActiveRecord::ConnectionTimeoutError: could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)

I’m going to see if I can get Discourse to run with normal credentials to try to rule that out.

(Joseph Method) #7

So I was seeing a lot of failed Sidekiq jobs in general (with no explanation in the logs) so I decided to rebuild the data container. I also restarted the server because some updates required a reboot. After rebuilding the data container, the failing Sidekiq jobs starting working again. Here is the Sidekiq graph since the server was first started a couple days ago. The change after rebuilding the data container is dramatic.

Anyway the problem doesn’t seem to be happening anymore so I think it’s safe to ignore this bug.

(Jeff Atwood) #8