Help me troubleshoot my Discourse SSO

Greetings, I am hoping to get some guidance. My SSO stopped working this week and I thought I fixed everything yesterday (it was working, I swear :slight_smile: Note: I took a look at “New Users” from yesterday and today and I had new users on both days (after I fixed it), now it is broken again… ) ). Unfortunately, the updates I made are not working today.

Problem: Users can’t create new accounts and users that logout can’t log back in.

I noticed that my discourse server has 400 errors on the following routes:

403: GET : discourse-url/users/by-external/USER-ID.json?
Note: I recently found in the api docs that this route doesn’t exist? (even though it has worked), it looks like the route is: https://discourse.example.com/u/by-external/{external_id}.json

404: POST: discourse-url/admin/users/sync_sso?

The reason the ? mark is at the end is I have an optional parameter field in a function that generates URLs, for these two routes all the data is sent in the form body or headers.

I am using the following library.

What I updated (and what I thought fixed the problem):

In all of my requests, I was sending the Api-Key and Api-Username via a query parameter. For the past few months, I noticed in my admin panel that I had a warning saying I was using dated headers in my request. It linked me to this post and the key details are here:

:warning: Deprecation Warning!
On April 6th, 2020 we dropped support for all non-HTTP header based authentication (excluding some rss, mail-receiver, and ics routes). This means that API requests that have an api_key and api_username in the query params or in the HTTP body of the request will soon stop working. Please see the example cURL request below for how to update your API requests to use the HTTP headers for authentication.

I updated all of my requests, now all of my requests have the Api-Key and Api-Username in the header and the content type is set to multipart form data.

If anyone can offer some guidance on what to look into to debug this issue I’d greatly appreciate it. I am almost 100% confident this was working at the end of my workday yesterday, I was able to log into and out of my account and I was able to create new accounts.

Please let me know if you need more information. Thanks!

2 Likes

The header fields need to use dashes (-), not underscores (_). Try changing the field names to Api-Key and Api-Username.

I’m not sure if this will fix the issue with users not being able to login to your site, but it will fix the issue with the 400 errors that you are seeing.

2 Likes

@simon, thanks for the response! Unfortunately I didn’t document my post well, I am already using - and not _ in my requests.

2 Likes

To start debugging this, go to your Discourse site settings page and search for ‘sso’ to get all your SSO settings. Make sure that the enable sso, sso url, and sso secret settings are correct. Then enable the verbose sso logging site setting. With that setting enabled, some additional log entries will be added to your site’s error logs (found at Admin / Logs / Error Logs.)

Try logging in via SSO. Then have a look at your error logs to see if they give you any details about the problem. If you’re not seeing anything useful, open your browser’s web inspector to its Network tab with the “Preserve log” checkbox checked. Have a look at the requests that are being made.

If you lock yourself out of your site while trying to fix the issue, as an admin user you can bypass SSO by going to /u/admin-login and entering your email address into the form. An email will be sent to you with a login link.

2 Likes

@simon, thanks for the tip! I’ve been looking at the logs but I am not that experienced in reading them. I get two different types of warnings and one error:

Here is the warning I get frequently:

Verbose SSO log: Started SSO process add_groups: admin: moderator: avatar_force_update: avatar_url: bio: card_background_url: email: external_id: groups: locale: locale_force_update: logo

Here is the error:

Job exception: The difference between the request time and the current time is too large.

When I try to log in under a test user on my site that I logged out of on discourse, I get the following in my network panel:

503 Service Unavailable: GET- https://my-site/auth/discourse_sso?sso=XXXX&sig=xxxx

Unfortunately, I’m hitting a roadblock on where to go from here.

1 Like

I think that error message is coming from Amazon S3. There might be some useful details about how to fix the issue in this topic: Backups have started failing due to server time being wrong. There is some more information here: amazon s3 - S3 Error: The difference between the request time and the current time is too large - Stack Overflow.

3 Likes

@simon thanks for the help! My server’s time was out of sync and I updated that, now my backups work again!

Now I am sporadically getting a new error:

In the logs section, I will randomly get the following warnings (I only got them 2 times):

MaxMindDB (/var/www/discourse/vendor/data/GeoLite2-City.mmdb) could not be found: No such file or directory @ rb_sysopen - /var/www/discourse/vendor/data/GeoLite2-City.mmdb

and

MaxMindDB (/var/www/discourse/vendor/data/GeoLite2-ASN.mmdb) could not be found: No such file or directory @ rb_sysopen - /var/www/discourse/vendor/data/GeoLite2-ASN.mmdb

I am in the process of looking up how to fix this issue, I tried rebuilding my app but I’m not 100% sure if the rebuild was successful. I am still randomly getting the MaxMindDB could not be found errors in addition to the 400 errors and the 503 error I was getting earlier.

1 Like

I’ve been chipping away at this for most of the early morning and haven’t made much progress. I think I eliminated the MaxMindDB errors (they were sporadic and inconsistent earlier, I haven’t been able to replicate them for the past 3 hours) and I rebuilt my app several times successfully.

Here is where the SSO Pipeline breaks:

  • user visits discourse
  • Since there isn’t an active session the user is redirected to discourse/session/sso_login
  • The user is redirected to my-site/discourse_sso?sso=XXXX&sig=XXXX
  • When the previous route from my site is hit I make a GET request to /users/by-external/userId.json
    • this returns a 403 Forbidden
  • Immediately after a POST request is sent to /admin/users/sync_sso
    • this results to a 404 "No route matches [POST] /admin/users/sync_sso
  • Eventually, my site returns a 503 Forbidden message (I need to clean up some of the error messages on my site’s end)

I feel like the error is on the Rails app side of things (please correct me if I am wrong). One reason I feel this way is because, at the EOD Friday, everything worked, there is proof since I had a few new users sign up between Friday evening and Saturday (and logging in or creating a new user is what was broken). As I mentioned in previous posts, I thought I fixed everything then, however, when I started work on Saturday, I noticed it was broken again.

1 Like

I’m not sure why you are making the requests to /users/by-external/<external_id>.json and /admin/users/sync_sso. The normal flow would be to just redirect the user to /session/sso_login with the SSO payload set as query parameters on the URL. There are details about what the sync_sso route is used for here: Sync SSO user data with the sync_sso route.

Making a request to /users/by-external/<external_id> with an external_id that is not yet associated with a Discourse users should return a 404 (not found) error. If the external_id is associated with a Discourse user, the user should be returned.

2 Likes

@simon, The request to /users/by-external/USER-ID.json is to check if the user already has an account on my discourse, if a user is found with that id, they are added/removed from discourse groups associated with my site with a PUT request to(/admin/groups/groupId/members.json) and then redirected to my-discourse/session/sso_login.

If the user doesn’t have an account, the account is added via a POST to /admin/users/sync_sso and after the user is created (and added to their proper discourse groups), they are redirected to my-discourse/session/sso_login.

I’ll follow up and re-read the docs you listed (thank you!). We’ve had this flow running without any hiccups since early 2015 (and discourse and the SSO option has been such a valuable tool for us!), it’s weird that it suddenly stopped working this past week.

3 Likes

@simon I really appreciate all of your help! I fixed the problem. The Api-Username we were using “deactivated” sometime last week (due to inactivity). I originally speculated that could have been the problem. I re-activated the user on Friday, and more than likely that is what fixed everything on Friday (I originally thought it was moving the Api-Username and Api-Key into the Header).

Discourse deactivated the same user sometime Saturday morning again, which explains why everything was working and then it suddenly stopped. I didn’t think the user would have been deactivated again so soon due to inactivity.

I changed the Api-Username now to “system” to prevent this from breaking in the future. Thanks again for your help, in the process of debugging this my backup logs are working again and I certainly learned a lot!

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.