Managing group membership via authentication

Sounds great - happy to help with any issues / questions along the way :slight_smile:

1 Like

Ok, I finally have a work in progress to share here

https://github.com/discourse/discourse/pull/12446

A few notes.

I’ve gone for the slightly tricker case of google apps hd groups for the initial implementation as I think it helps to think through the possible permutations of this, e.g. the need to account for domain-specific groups from a provider.

To implement that use case, I’ve also had to introduce a new concept of “secondary authorization” at the point of authentication, to allow for incremental authorization. I considered a few different ways to implement asking for specific user’s group permissions (i.e. if they were authenticating with a hd), and this seemed to be the most feasible. I appreciate this is perhaps a bigger change on that front than anticipated, but it’s perhaps worth discussing.

Note that to implement the google hd groups case you need to give the non-admin members of your google apps hd groups delegated admin authority in order to list their groups (via the admin directory API). There’s actually a “beta” pre-built administrator role called “Groups Reader” that works well for this. See Pre-built administrator roles - Google Workspace Admin Help

The Google implementation works. If you set it up and then authenticate with a hd your google hd groups will be available in the automatic group membership setting, you’ll be added to that discourse group if that hd group is selected, removed if it’s removed (with both actions logged with some specificity), and subsequent users in that google hd group who authenticate will be added immediately.

The details should be evident from the code and the specs. You’ll also notice that I’ve ended up adding three new tables. I attempted a few more "lightweight’ solutions, but they each ended up being more convoluted and inefficient when it came to handling updates to a user’s associated groups and a group’s associated groups. It’s hard to avoid just creating new tables for each. Open to ideas on the data modelling front though, and more generally.


Some technical todos left (aside from the conceptual/product questions raised above). Suggestions also welcome on this front:

  • Perhaps serialize the associated_groups label (instead of modelling on the client).
  • Add missing specs and qunits
  • Perhaps move user_associated_group / group_associated_group creation/destruction to a background job, as with large numbers this could be slow.
3 Likes

This is looking very cool!

I am a little hesitant about the ‘secondary authorization’ thing, and also the provider_domain column. Can you elaborate some more on why they’re needed? Seems like they’re quite google-specific… is there a reason we can’t request the admin.directory.group.readonly scope during the first auth request? And maybe just prefix the group name with the domain? (or exclude the domain entirely, since I assume people will only be using this with a single google ‘hosted domain’?)

Yeah totally happy with 3 tables here - it keeps things cleaner

Agreed :+1:

We need to be careful here. Any group memberships need to be assigned before the user loads the site for the first time. Otherwise, they won’t see group-specific things the first time they log in (e.g. secure categories). So I think changes to user_associated_group records should be processed synchronously.

But for group_associated_group record changes, I think you’re right. Changes there could affect thousands of users, so they will need to be processed in_batches. I think I’d start by doing it synchronously, with a loading spinner in the UI. That way, admins will clearly be able to see when it’s running/done.

If we see it getting close to 30 seconds (the unicorn request timeout), then we might need to think about a background job.

We might also need to think about adding some DistributedMutex locking here. For example, if a user logs in while a group_associated_group change is being processed, what happens? Happy to discuss this kind of stuff on GitHub once we get the overall architecture finalised.

Following this with interest — and throwing this in as related Does `sso overrides groups` work with Oauth2?

For my use case, I’d be perfectly happy with the Discourse Connect behavior to be implemented for other auth providers.

2 Likes

Great :slight_smile: we’re getting there.

I’ll deal with each in turn.

Incremental authorization

The reason you can’t request group permissions on the first request is that you don’t know who’s signing in, or what they’re comfortable with sharing. You could restrict google associated group mapping to people using the google_oauth2_hd setting, however this would limit the scope of the feature a fair bit. Having a team that uses google apps in addition to “public” users who also want to use google auth is relatively common.

*edit I should clarify, that if you ask for a groups scope and the user cannot give it (e.g. their HD has not delegated authority to non admin users as described above) then the auth will fail. You can’t request optional scopes alongside required scopes.

Moreover, that approach, i.e. asking for a groups scope up front as a standard practice in implementing this in the various authentication methods, would be arguably more google-specific than the alternative, as you don’t always have the equivalent of the hosted domain system to restrict login to. For example, I could be wrong, but I don’t think there’s a way to restrict Github OAuth2 login to a specific github organisation.

In other words in a number of cases you’d be left with the choice of asking everyone who uses that auth method to grant the relevant groups scope, or not using the feature. That approach may work in some contexts, but not in many. This incremental authorization approach gives different auth methods more flexibility in implementing the feature.

It’s true Google has been championing incremental authorization in the OAuth2 space, for example the working papers on the subject have all been written by a googler

https://tools.ietf.org/html/draft-ietf-oauth-incremental-authz-04

However the concept is not Google-specific (other people don’t necessarily call it “incremental authorization”). It’s relatively common in different forms in mobile apps, and is being adopted in OAuth2 by other providers. For example, here’s Facebook’s docs on the same subject.

You’re probably familiar with the field already, but it is considered “good practice” to

  • tell people you’re about to ask for more permissions than the standard basic info;
  • and maybe why.

If the user clicked “Sign up with Facebook” in a Discourse login form, and then, in addition to their email, they were also asked for access to their facebook groups, they might drop off. Facebook puts it like this

As a general rule, the more permissions an app requests, the less likely it is that people will use Facebook to log into your app. In fact, our research shows that apps that ask for more than four permissions experience a significant drop off in the number of completed logins.

This raises the question of whether it’s a good idea to be asking for additional scopes at the point of authentication in the first place, and basically I concluded that we don’t really have any other good options. There’s the need to have immediate access to groups in some scenarios (as you alluded to), and there’s also just the reality that without it being asked up front many users would not take an additional step to authorize their groups on a service, in say their profile, or the groups page.

It has to be done at the point of authentication, which brings us back to the issue referenced above, and why I implemeted a “secondary authorizatiton” system. It is indeed intended to be a lightweight “system” insofar as it is relatively easy for another service, say Facebook or Github, to implement a secondary authorization request to obtain access to the user’s groups after they have authenticated and optionally passing a certain test relating to their basic info.

Each provider just needs to

  • Return a result with secondary_authorization_url

  • Use the state parameter to detect which authorization request the user is up to

  • Provide a omniauth_secondary_authorization_description for users/omniauth_callbacks/secondary_authorization.html.erb. e.g. this is the one for google, which the user sees before they confirm the secondary authorization redirect

    As you've signed in with a %{domain} email, we need to ask for permission to view your %{domain} groups.

None of these parts are google-specific.

What I would like to do here is allow the user to say “no” to the secondary request, and still authenticate. In the Google Apps HD scenario this isn’t really an issue as if their account is part of a hosted domain, then they’re unlikely to want to, or be in a position to, say no. However it should be accomodated to allow for the full range of authentication secnarios here I think.

Finally, it should also be noted that secondary authorization is not required for associated_groups to work. An auth provider can just ask for the groups scope up front, and then add the groups to the auth result after receiving the first response. Indeed, we should probably build that in as an option to the basic oauth2 and openid connect plugins.

Provider Domain

I do think there needs to be some form of secondary identifier in the associated_groups table that’s readable by the site admin. There are a number of scenarios in which a group name by itself may not be enough. There’s the possibiity of name conflicts across each services’ equivalent concept, e.g.

  1. multi-domain google group management (you can have multiple domains in one workspace too)
  2. mutli-org github group management
  3. multi-server discord role management
    etc

We could change domain to namespace perhaps. We could include it in the group’s name, but would that give as any advantages? It may be useful to query by the “domain” or “namespace” at some point. Yes, perhaps namespace would be better than domain.

The reason it needs to be “admin readable” is that it’s used in the label seen by the admin in the groups UI, partly for disambiguation purposes.

I’m mulling whether it makes sense to attempt to also store a provider_id here (if one exists). Could be useful down the track perhaps.

Yup, agree with all this, and thanks for the tips. I’ll give this part a go and we can discuss it further on the PR.

@david I’ve just pushed a few updates on this one including

  • DistributedMutxes and in_batches in group_associated_group
  • Acceptance tests (already had rspec)

Will need some further work no doubt, but it’s currently working according to spec and all tests are passing. Take it for a test drive, let me know what you think and what changes you’d like.

5 Likes

Perhaps worth marking it as a non draft for now?

1 Like

Hi @angus! I’m curious if you’ve made any further progress on this? I’m very interested in the simple “strict” behavior, as I understand it, and since we control our Oauth2/OpenID connect provider, I’m not worried about the “secondary authorization” case. Any chance that something like that could land sooner?

If it’s any help, our environment is documented here: Infrastructure/Authentication - Fedora Project Wiki, and I’ve configured DIscourse to request oauth2 scope openid profile email https://id.fedoraproject.org/scope/groups

Basically, all I want is:

  • Leave the trust level and staff groups alone
  • Add the user to any existing Discourse groups in the list from SSO that are in the list if they’re not there already
  • Remove the user from any groups they are in that aren’t in the list

I freely admit I don’t understand all the intricacies… is there a complication I don’t understand?

2 Likes

I’ve put aside time this weekend to work on this Matt. I’ll have an update next week, probably on the PR on github.

2 Likes

Awesome — thank you so much. I don’t mean to nag, but Discourse Support suggested that posting in this topic would be the best way to see the current state of things. :slight_smile:

I’m excited for your work on this, because there is so much we can do with the Fedora discourse sites once we have this that we just can’t right now!

1 Like

Note that I’ve made an updated draft PR with a new approach (incorporating changes requested by @david on my last attempt). As mentioned in the comments to the PR, I’m looking to finish it off sometime this week.

4 Likes

Just so I’m not getting myself excited for no reason — this says “google”… will it also work with non-google oauth2 sso?

2 Likes

It’s a generic system, but the first supported use case will be groups in a google workspace. Once the system is in place adding support for additional providers will not be too difficult.

5 Likes

Note that this PR was moved from draft to published over the weeknd (i.e. it’s ready for review again).

4 Likes

I just merged this PR - huge thanks for all your work here @angus! Excited to see how this gets used and extended going forward! :confetti_ball:

I’ve labelled the site setting “Experimental” for now, to give us some time to test it out and make sure everything is working smoothly. Once we’re confident, and we’ve added support in a few more authentication providers, I’ll be sure to make a #feature:announcements topic for the feature.

7 Likes

Awesome! :slight_smile:

Thank you David. It wouldn’t have happened without your support. Happy to help with adding additional providers.

5 Likes

YES! Thank you everyone. We’re planning to use this extensively in Fedora once it works with oauth2.

4 Likes

I’m also excited for this to be available for non-Google oauth2/openID logins…any update on if/when that option might be available?

3 Likes

We don’t have a specific timeline, but it’s certainly #pr-welcome if anyone would like to submit a patch

3 Likes

Looking forward to this as well! The use case for us is to pull group membership from Keycloak on authentication.

2 Likes