Occasional OAuth errors

PRR · November 8, 2023, 12:06pm

We are using external OAuth for user authentication. Occasionally users receive error 500 when coming to the platform: Snippet from error log:

Started GET "/auth/oauth2_basic/callback?code=[coderemoved]&state=[stateremoved]" for [IP] at [timestamp]
(oauth2_basic) Setup endpoint detected, running now.
(oauth2_basic) Callback phase initiated.
Faraday::TimeoutError (Timeout::Error)
lib/final_destination/resolver.rb:31:in `block in lookup'
lib/final_destination/resolver.rb:8:in `synchronize'
lib/final_destination/resolver.rb:8:in `lookup'
lib/final_destination/ssrf_detector.rb:127:in `lookup_ips'
lib/final_destination/ssrf_detector.rb:95:in `lookup_and_filter_ips'
lib/final_destination/http.rb:13:in `connect'
lib/middleware/omniauth_bypass_middleware.rb:43:in `call'
lib/content_security_policy/middleware.rb:12:in `call'
lib/middleware/anonymous_cache.rb:387:in `call'
lib/middleware/gtm_script_nonce_injector.rb:10:in `call'
config/initializers/100-quiet_logger.rb:20:in `call'
config/initializers/100-silence_logger.rb:29:in `call'
lib/middleware/enforce_hostname.rb:24:in `call'
lib/middleware/request_tracker.rb:233:in `call'

If user just reloads the page, everything works, with log info:

Started GET "/auth/oauth2_basic/callback?code=[coderemoved]&state=[stateremoved]" for [IP] at [timestamp]
(oauth2_basic) Setup endpoint detected, running now.
(oauth2_basic) Callback phase initiated.
Processing by Users::OmniauthCallbacksController#complete as HTML
  Parameters: {"code"=>"[coderemoved]", "state"=>"[stateremoved]", "provider"=>"oauth2_basic"}
Deprecation notice: `SiteSetting.anonymous_posting_min_trust_level` has been deprecated. Please use `SiteSetting.anonymous_posting_allowed_groups` instead. (removal in Discourse 3.3) 
At /var/www/discourse/lib/site_setting_extension.rb:160:in `public_send`
start
Redirected to https://[pageremoved]
Completed 302 Found in 83ms (ActiveRecord: 0.0ms | Allocations: 11138)

Sadly no steps to repro. It tends to happen when users who have been away for a longer period of time, but I can not confirm it with confidence. It is possible, that there has been a platform upgrade since their last visit.
Any suggestions or additional information I could provide?

kayleeeeeeeee · January 21, 2024, 6:33pm

Bumping this. I’m experiencing exactly the same issues.

pfaffman · January 21, 2024, 6:55pm

Timeout error suggests a networking issue. Could just be a network glitch.

kayleeeeeeeee · January 21, 2024, 11:24pm

I was thinking that, but the error arises way too quickly for it to be normal behaviour. I’m wondering if there might be an overzealous DNS lookup timeout somewhere:

The error is in “resolver.rb”
It is temporarily fixed by refreshing - when the DNS lookup would be cached
For some completely inexplicable reason, I can’t get it to read the OIDC discovery document from any URL that involves our self-hosted DNS. This is despite the fact that I’m perfectly able to curl the file manually from within the docker instance. I’ve eliminated many different variables and the DNS seems to be the only common factor.

Importantly, the Discourse server is able to talk to the OIDC server, even when it fails like this. From the access logs, there is one request:

21/Jan/2024:23:10:21 +0000] "POST /application/o/token/ HTTP/1.1" 200 7998 "-" "Faraday v2.9.0"

when it fails, and two requests:

[21/Jan/2024:23:21:03 +0000] "POST /application/o/token/ HTTP/1.1" 200 7998 "-" "Faraday v2.9.0"
[21/Jan/2024:23:21:05 +0000] "GET /application/o/userinfo/ HTTP/1.1" 200 5254 "-" "Faraday v2.9.0"

when it succeeds. Regardless, it never takes longer than 5 seconds. I’m yet to try setting up a proxy for the OIDC server that uses Cloudflare DNS, but that’ll be my next step.

pfaffman · January 21, 2024, 11:37pm

Common wisdom is that it’s always dns.

kayleeeeeeeee · January 22, 2024, 2:50am

Welp, it’s definitely DNS. Rather than set up a proxy I just added my OIDC server to the hosts file in the docker container and it seems to work now. This is a fragile and suboptimal solution though; I think the developers need to fix the timeout so that it’s something sane. This case reminds me of the 500 mile email story.

pfaffman · January 22, 2024, 6:58pm

You can add stuff to your app.yml to update /etc/hosts on a rebuild. You can look at some other templates for examples.

Could be, but not many folks are having trouble. Could your self-hosted DNS server be overloaded sometimes?

I don’t know where to go about changing the timeout. I don’t remember ever doing it.

PRR · January 22, 2024, 7:19pm

In my case, IdP and Discourse VMs as sitting next to each other and while noone can completely exclude possible network issues, no other service experiences it.

Topic		Replies	Views
OAuth Error - Unable to know exact cause for this error Support oauth2	6	53	December 17, 2024
OAUTH2 basic - a nightmare :-( SSO oauth2	21	365	January 2, 2025
Can't delete invitations sent or abandon post draft Bug	3	711	March 18, 2021
Discovery document is missing SSO	3	1001	September 20, 2023
OIDC: Authorization timed out Support openid-connect	2	1118	January 3, 2025

Occasional OAuth errors

Related topics