Failed VersionCheck Jobs behavior

Bruno_Salazar · December 4, 2021, 12:03am

Sorry if this is not the correct category for this.

I’m evaluating discourse and the VersionCheck Job is failling in my environment.

I’ve noticed that failed jobs are piling up inside sidekiq and probably will get moved to the “dead” section after the default 25 retries (as per Class: Sidekiq::JobRetry — Documentation for sidekiq (6.3.1)).

I know that I need to investigate what is causing it to fail, but the point here is: Does it make sense to maintain these jobs there? Isn’t best to simply discard failed version checks and wait the next job execution?

At this moment I have more than 80 VersionCheck jobs waiting for retry and to me it looks like a waste of resources (probably little, but still a waste)…

From what I’ve checked, adding sidekiq_options retry: false to app/jobs/scheduled/version_check.rb would solve this.

Am I missing something?

pfaffman · December 4, 2021, 11:28am

How did you install? Is there reason to believe you have network issues? Ram?

You may be right, but since you’re the only person to report this (at least so far) it’s not made it on the list of optimizations. It does make sense to just let it fail after one try,I’d think.

Steven · December 5, 2021, 1:14pm

When was the last time you upgraded?

There was an issue with the version check job a few weeks ago (around end of october), it is fixed now. If you upgrade in the terminal (./launcher rebuild app), it should be ok.

Bruno_Salazar · December 6, 2021, 1:12pm

I’m using the standard docker install inside an ec2 instance.

I’m in a corporate environment, so there are lots of firewall, proxies and security scanners between the instance and the internet. In the logs I see a “Job exception: Connection reset by peer - SSL_connect (Errno::ECONNRESET)” error, so probably some firewall is denying the request at some point… I’m still understandig how discourse does this version checks so I can reproduce them by hand and get more details.

Totally understand this. In the past I’ve worked with gitlab and seen lots of issues where full sidekiq queues caused performance degradation and other weird behaviours so everytime I see something like this my alarms ring.

Bruno_Salazar · December 6, 2021, 1:24pm

I’m on 2.8.0.beta9 (959923d3cf)

Yeah… The upgrade in the terminal or via GUI is working OK (runing it on a weekly basis). The only issue in this case is that the main administrator screen doesn’t show the latest version and always says that i’m running an outdated version.

pfaffman · December 6, 2021, 6:14pm

Then you should definitely run a command line upgrade

Falco · December 6, 2021, 8:09pm

Discourse will reach out to the internet for tasks like checking version upgrades, fetching user avatars, downloading remote images to local storage, and general oneboxing. If the instance is severed from the internet, there will be some breakage indeed.

Bruno_Salazar · December 6, 2021, 8:21pm

Yes! I’m doing it every week until I find a solution.

I had to give up on oneboxing exactly for this reason. For now I can’t allow full internet access for this server.
github.com/* is already allowed, but probably this versioncheck job uses another URL to do this.

pfaffman · December 6, 2021, 8:40pm

What I’d do is just turn off SiteSetting.version_checks, remove the discourse_docker plugin and do command line upgrades.

But, here, if you can open up https://api.discourse.org/api, then you’re probably good.

github.com

discourse/discourse/blob/main/lib/discourse_hub.rb#L90


      
          response =
            Excon.public_send(
              action,
              "#{hub_base_url}#{rel_url}",
              {
                body: JSON[params],
                headers: {
                  "Referer" => referer,
                  "Accept" => accepts.join(", "),
                  "Content-Type" => "application/json",
                },
                omit_default_port: true,
              }.merge(connect_opts),
            )
          
          if (status = response.status) != 200
            Rails.logger.warn(response_status_log_message(rel_url, status))
          end
          
          begin
            JSON.parse(response.body)

Bruno_Salazar · December 8, 2021, 3:10pm

Thank you for the info! It worked when I allowed access to https://api.discourse.org/api/version_check

Topic		Replies	Views
Botched restore led to Sidekiq job failures Support	7	559	February 17, 2022
Auto-update issue - "Ensure sidekiq is running." Support	3	1647	December 3, 2015
Sidekiq not running Installation	22	5234	June 8, 2024
"Ensure sidekiq is running." when it is definitely running Installation	19	7675	October 24, 2015
Sidekiq error message Support	4	437	December 2, 2021

Failed VersionCheck Jobs behavior

Related topics