Sporadic issue wp-discourse/SSO: Nonce has already expired

Hi. We have this sporadic issue a couple of times per month where:

  1. a user signs up in WordPress
  2. they go to our Discourse forum and press the blue Login button
  3. the page hangs for several seconds (about 30) and then we get a 502 bad gateway error and the person is not logged in to Discourse nor is an account created

The URL that’s hanging on is:
https://OUR-FORUM-URL/session/sso_login?sso=LOTS-OF-BIG-LONG-ARGUMENTS

Over in the Discourse logs I see this (I’m not including real user info):

Verbose SSO log: Nonce has already expired nonce: bff3e6d112bb037832af3d0753cc1b3a name: FIRST LAST username: SOME-USERNAME email: SOME-EMAIL-ADDRESS avatar_url: https://secure.gra.....

I’ve searched here for “Nonce has already expired” but couldn’t find a solution that seemed applicable here. Especially since it’s very sporadically occurring with no clear pattern to it.

Any ideas?

Thanks!

That indicates that the nonce that is passed with the SSO payload has already been used. I’m wondering if this is related to customizations that have been made for Membermouse. If you could enable the verbose SSO logging setting on Discourse, it might be possible to get some information from the logs about what is happening.

2 Likes

I used a similar method in several projects with lastest version of Wordpress and Discourse. I used Wordpress as SSO provider. It works correctly.

Are you using the standard Wordpress registration system? Custom functions on function.php can cause conflicts but it seems that your problem is not related to it. Also, you can check API and Webhooks.

After solving this problem you will probably need to be that:

Thanks @simon and @bekircem for the replies. So I came back 3 hours later and tried to reproduce the problem again and lo and behold it worked fine this time – i.e. Discourse created this user’s account with no issues.

It’s strange, between the two WordPress+Discourse sites I manage we have a dozen signups per day and this issue only pops up a couple of times per month. Whenever it happens, I’m able to reproduce it easily.

@simon, it looks like I already had “verbose sso logging” checked off. Where would I find the more verbose log information for future reference?

Go to Admin/Logs/Error Logs and then look for a date and time that’s close to when the error occurred. The SSO logs all start with the phrase Verbose SSO log.

2 Likes

Ah, I see, at the bottom of the screen. For the timestamp when this issue happened a few hours ago, the verbose SSO log contents are pasted below. I replaced anything sensitive with dashes but all those values looked normal.

I just now noticed that when I search my those Discourse logs for “Nonce has already expired”, I see that this issue actually happens once or twice per day.

Verbose SSO log: Nonce has already expired

nonce: bff3e6d112bb037832af3d0753cc1b3a
name: ----- -----
username: -----
email: -----@-----.-----
avatar_url: https://secure.gravatar.com/---------
avatar_force_update: false
require_activation: false
bio: 
external_id: ----
return_sso_url: 
admin: 
moderator: 
suppress_welcome_message: 
title: 
add_groups: 
remove_groups: 
groups: 
profile_background_url: 
card_background_url:

If you look back in the logs from there, do you see the same nonce being used for another SSO login attempt? You can use your browser’s search to look for it. A nonce shouldn’t appear more than twice in the logs - once for Started SSO process nonce and once if the nonce was used to login a user.

The other thing to look for is the time difference between when the nonce was created and when it was returned to Discourse.

1 Like

OK I checked and the nonces are not being duplicated for any other login attempts. Whenever this issue is logged I see a pair of events like I pasted below.

They are often timestamped a minute apart (the seconds aren’t logged) which makes me wonder if Discourse is for some reason taking a really long time to do whatever it’s doing and by the time it’s done, it checks if the nonce is still valid but since X seconds have passed, it’s considered no longer valid/expired? That may be nonsense? :crazy_face: Just attempting to come up with an explanation. Does the code have something in it that says “this nonce must be cross-checked within X seconds?”

2 Aug 12:14 am
Verbose SSO log: Started SSO process nonce: 744992aea44953875d78b9c08781ae0e name: username: email: avatar_url: avatar_force_update: require_activation: bio: external_id: return_sso_url: http...

2 Aug 12:15 am	
Verbose SSO log: Nonce has already expired nonce: 744992aea44953875d78b9c08781ae0e name: ---- ---- username: ---- email: ----@----.--- avatar_url: https://secure.gravatar.com/avatar/---...

Yes, the nonce is valid for 10 minutes. From the log entries that you posted, the nonce expiry time isn’t the issue. I’ll look into this some more and see if I can find anything.

2 Likes

Another total long shot idea here – Is there a limit to the number of API interactions per hour? Just asking because this issue seems to resolve if I just try again several minutes or hours later.

Yes, there is a limit of 60 API requests per minute. This limit shouldn’t be affecting regular SSO login though. It will affect requests made to the sync_sso route - those requests include the API key.

2 Likes

This is an old-ish topic, but just wanted to share that I was having this same error message, but it was connected to the root issue that’s solved here: Imported Discourse -> Wordpress Users Get Error

3 Likes

Hi @simon. I’ve been digging into this issue recently because it’s happening a lot on one of our sites. (It’s not related to what @jesselperry mentioned but thank you, @jesselperry, for the advice!)

I put a bunch of debugging statements into /wp-discourse/lib/sso-provider/discourse.php and it looks like at the point where it does $response = wp_remote_post(...... an error comes back that says:
cURL error 28: Operation timed out after 5001 milliseconds with 0 bytes received

So for kicks, I bumped up the time that WordPress will wait for a cURL response all the way up to 25 seconds, and the same thing happened except it said:
cURL error 28: Operation timed out after 25001 milliseconds with 0 bytes received

Does it seem like Discourse is legitimately taking a really long time and WordPress is giving up waiting or could it something else?

(I also checked all the arguments to wp_remote_post and they look normal to me.)

Thanks if you have any ideas! :slight_smile:

1 Like

If you’re getting 0 bytes back that suggests some kind of underlying network issue, doesn’t it?

My interpretation was that it issues the cURL request, waits X seconds and gets no response. The timestamps on the debugging statements match up with X seconds passing. But I could be interpreting this wrong.

1 Like

No - you are correct.

1 Like

I’ll also add this in case it’s helpful:

If a user is logging into WordPress and that user already has a Discourse account, the whole wp_remote_post interaction between WordPress and Discourse works instantaneously and works great.

However when there is a new user in WordPress who does not yet have an account in Discourse, that’s when this wp_remote_post interaction problem happens. And even in this situation, it seems like this problem happens 85% of the time. Not all the time.

We have 17,040 users (our forum is based on a 20-year-old community so that’s why there are so many users). One theory I have is that there’s a long lookup time happening (like to see if this user already exists) since there are so many users?

Hmm, there is the external_id and also a username encoded within the payload, it should be fast to figure that out. The amount of users isn’t THAT large either.

The fact that this only happens on new user creation is an interesting one though.

I would still expect to see more than 0 bytes, timing out with no data returned (even a 200 response) suggests the request hasn’t even landed to be processed. Most of the errors with curl which relate to error 28 are network communication issues, relating to resolution, redirect and firewalls/network policy.

No, it is not uncommon at all that all data for a request is returned at once, especially with API requests.

What you describe only happens when there is a large response which is not transferred in time, but that is not the case here.

The request is received by the nginx service and passed upstream to Discourse, where it times out. So the issue must be somewhere within Discourse.

The requests that do not fail, typically return within 100 ms.