Sporadic DiscourseSSO login issues with expired nonce

We use DiscourseSSO and sporadically users run into login issues (similar to Sporadic issue wp-discourse/SSO: Nonce has already expired). I was trying to debug this by adding some extra logging and luckily ran into the issue after a couple of days. To be clear, login works most of the times, just that sporadically (may be for 5 mins a day) users run into login issues.

We use subfolder setup on multi-node cluster, using external shared DB and Redis if that makes any difference. There are two failing scenarios:

  1. Nonce expired
    When the user is redirected to /session/sso_login, SessionController does not get session_id in the session and thus is not able to lookup the nonce. I tried logging the session (Rails.logger.warn("Verbose SSO log: Session #{session.keys.map {|key| [key, session[key]].join('=')}.join(',')}")) and it printed empty session. I verified that the browser is sending “_forum_session” cookie as received in the previous request and the cookie is logged on the server if logging in SessionController (Rails.logger.warn("Verbose SSO log: Cookies #{cookies.map {|cookie| cookie.join('=')}.join(',')}")).

  2. Login completes but the user gets Login error on the screen
    When the user is redirected to /session/sso_login, SessionController is able to verify SSO data and log the user in (I see Verbose SSO log: User was logged on user5 in the logs). But when it redirects the user to /forums/latest, user sees an error on the screen. I noticed that in the working flow this action clears/returns-empty “cn” cookie but in the failing scenario, it just updates and returns “_t” cookie. My guess is this scenario might also be related to missing session data.

If we wait for 5 mins or so and try again, then everything starts working fine again.

I have not tested if all users hitting the site at that time run into the issue or not, but I have been told anecdotally that multiple users ran into it once on our instance.

hello!

What do you use for the subfolder setup? If you have some kind of web application firewall in front of Discourse, it might be worth it to check for caching issues. In our experience, that is the first thing to rule out.

Thanks for your response Leonardo. We use nginx as our main gateway which sends path based URLs to the nginx inside the Discourse container.

I added these two lines at the start of session_controller.rb/sso and session_controller.rb/sso_login

    if SiteSetting.verbose_discourse_connect_logging
      Rails.logger.warn("Verbose SSO log: Cookies #{cookies.map {|cookie| cookie.join('=')}.join(',')}")
      Rails.logger.warn("Verbose SSO log: Session #{session.keys.map {|key| [key, session[key]].join('=')}.join(',')}")
    end

For the first failure scenario mentioned above, I got the following for /sso (on node1 of the multinode cluster)

Verbose SSO log: Cookies cn=12,_forum_session=ZjBveGorRVN1bU0zeGRKVHZtWUZDamUxTUJSUkJHUDZDaHhLMkh3U0lXMlpCYS9PTnpJWEovcTlZVDFTSTJuNkVNUE9NdlNvVWlidStIdk9SeTlRYzZ5YVp0N0pXdmhnTldlaSt4d1o3TC9mUm1nSUhsOUtiWFRyVGZBYkJLRHRRR0lFZmM0RkVxLzl0V2JEODR4NGMxQUJvOGhpdVc0c2JsdDFESHo2TWxJPS0tRXZTL0FHZlM1Yy9QVWJkc2xaaTYvUT09--36fa626c698a401db1e7f13276ee6bfde16dea77,sessid=6b4afa7755dc9aa54e3fb16453a28324,<ADDITIONAL_COOKIES_REDACTED>
Verbose SSO log: Session 
Verbose SSO log: Setting nonce 8199453c67e347124ecb2e57e5738336 with key SSO_NONCE_8199453c67e347124ecb2e57e5738336

and the following for /sso_login (on node2 of the multinode cluster)

Verbose SSO log: Cookies cn=12,_forum_session=WFRkNThYYUZwUnlOQjF5VHdUZGRUWE1UNUx2a3Z5ZlJCOGl0VFRRUlF2bm5vQUQzMWdaUVZVUnJkNmdIUjlRTE52d1B5MXJnV0svWkJMRWZrOU5XellvV0IzMTBScERwM0lzT3VIUWc2SEppb2xpTlkxaFpuc1dvU2d4SkdZRXFYYjJzakRQTXFmS2lYTlhxVEd5Zi9nQ3dZQnVUR1pDSndScGZhcVNJOW1ZPS0tNFduSE1YRDk5cWdMRXNsWnBzbDVhZz09--00ab1b89ff4cf05c9f3f3ed71eec9c0c4557f032,sessid=6b4afa7755dc9aa54e3fb16453a28324,<ADDITIONAL_COOKIES_REDACTED>
Verbose SSO log: Session 
Verbose SSO log: Checking nonce 8199453c67e347124ecb2e57e5738336 with key SSO_NONCE_8199453c67e347124ecb2e57e5738336
Verbose SSO log: Nonce is incorrect, was generated in a different browser session, or has expired

On the Redis server, I do see the nonce key

redis:6379[3]> KEYS "*NONCE*"
1) "default:3aa05452fdd8fd4a93481eb8afa90f3aSSO_NONCE_8199453c67e347124ecb2e57e5738336"
2) "default:21639ca4bef85f68c1d72824e3a49bd6SSO_NONCE_7d54c965762e6861799f62ef7c5cfa60"
3) "default:_CACHE:USED_SSO_NONCE_86886a948684ff110d4830919d4e6de5"
4) "default:_CACHE:USED_SSO_NONCE_d04fdbf483fe61129a6fcc54087cb4e4"
5) "default:f7c87c11539908b30f9e307ef05d3f18SSO_NONCE_90a6a6997b7bd5d75eac1ac0cfc6dee2"

My concern is Session being blank for /sso_login.

Pinging the topic, if anyone has any suggestion.

is the site public by any chance? It would help if we can debug it online.

2 Likes

Yes it is, I will send the address in a private message.

Update: sent via private message

1 Like

Does login break for all users at the same time? Or does it happen at a different time for each user?

The fact it starts working again after some time makes me wonder whether there is a cache involved. Does your NGINX config, or any other intermediate proxy (e.g. cloudflare), perform any caching?

It breaks for all users for that short duration. My first guess was an intermediate node messing with the data, but when I logged cookies from the controller (as explained above) I was able to see the cookies. Is there anything else that I should check as well ?

Pinging the topic again.