Adding SSO after many users already signed up -- how to migrate them?

Hi all.

What is the proper way to import all existing Discourse users to intercoin.app from Discourse? Is there some sort of REST endpoint that returns all users with their hashed passwords and salts, among other things? What is the link to the hash algorithm on github? I will have to code on our side, to use the same hash algorithm with the entered password and salt, if our own doesn’t work, in order to let those guys log in. I think #2 is relevant whenever a Discourse user turns on SSO later (like we are) so solving it would help other Discourse users too.

2 Likes

Interesting approach.

In user.rb

  def confirm_password?(password)
    return false unless password_hash && salt
    self.password_hash == hash_password(password, salt)
  end

  def hash_password(password, salt)
    raise StandardError.new("password is too long") if password.size > User.max_password_length
    Pbkdf2.hash_password(password, salt, Rails.configuration.pbkdf2_iterations, Rails.configuration.pbkdf2_algorithm)
  end

and then Pbkdf2 code is here: discourse/pbkdf2.rb at 201228162c277b9833bb2988388553fdbfb39521 · discourse/discourse · GitHub

2 Likes

Excellent! Now what is the HTTP endpoint that I call in order to get all the user info, including password hash and salt?

I imagine that there isn’t one to serve up to the public at large (why make it easier to hack people?) So what can I do? Connect to the MySQL database? Write a Discourse plugin?

1 Like

Those are basically your options, yes.

1 Like

Is the database schema documented anywhere?

How to connect to the postgres database in the docker? Sorry if it’s a silly question.

1 Like

I just spoke to our team, and they agree, I’m prepared to pay someone to build a small Discourse plugin that would expose via an endpoint some JSON info about a user, given their email address.

I found discourse/user.rb at main · discourse/discourse · GitHub has “password_hash” salt and name, username. But it doesn’t have email. For that, I see user_email discourse/user_email.rb at main · discourse/discourse · GitHub

So given an email the plugin will just search user_email table by the mail, then find user_id and grab the user row, and send all the “safe” fields, including the salt.

For extra security, the requests can be signed via HMAC using a shared secret that can be provided to the plugin.

Does anyone want to make this one? Message me or reply here and let me know how to contact you. Hopefully it’s straightforward (a couple SELECTs and a check for HMAC if the secret was set). We’ll read the JSON.

1 Like

I would just use the existing admin/users/list/active.json and extend the response with the hashed passwords.

Also, stick to the existing API authentication mechanism, don’t reinvent another wheel.

1 Like

So you’re saying have a one-time thing that imports all users joined with all their salts and passwords?

Fine but that still needs to be a plugin, does it not? So it would be great if someone with Discourse can create that.

1 Like

I would probably use the data explorer plugin to export the info you want. It’ll be much easier than writing a new plugin.

1 Like

How do I find out this config value? What is the default, is it in the code somewhere? @RGJ

root@server:~# cd /var/discourse/
root@server:/var/discourse# ./launcher enter app
x86_64 arch detected.
root@server:/var/www/discourse# rails c
[1] pry(main)> Rails.configuration.pbkdf2_iterations
=> 64000
[2] pry(main)>

Thanks! OK @RGJ a couple quick questions:

The xorcist library is just a faster string xor, right? What if one of the characters ends up being 0 because ‘a’ was xored with ‘a’ – what happens to that string? Aren’t strings null-terminated?

My goal is to port this to PHP, so anything you can do to help (such as giving me info on how to replicate it in PHP) will be very helpful.

Also what is this line doing? ret.bytes.map { |b| ("0" + b.to_s(16))[-2..-1] }.join("")

$u = hash_hmac('sha256', $password, $salt . pack('N', 1));
$ret = $u = hash_hmac('sha256', $password, $u);
for ($i=2; $i<$iterations; ++$i) {
  $u = hash_hmac('sha256', $password, $u);
  $ret = ($ret ^ $u);
}
// todo: figure out what RUBY is doing on this last line

Is this close? Can you please fix up this PHP code?

This is a built in function.

$hash = hash_pbkdf2('sha256', 'YourPassword', 'YourSalt', 64000, 64, false);

Typically hash algorithms operate on binary data and the result is hex- or Base64-encoded upon output. So that’s not an issue.

1 Like

Thank you very much Richard! You saved me SO much time having to implement it in userland PHP!

Yes! I was able to create a script that goes through all the Discourse users, and imports them with their passphrase hash into our platform.

Soon, we’ll be able to let anyone who has a Discourse forum add also Events, Videoconferencing, Media, and more, with Discourse living in the “Discuss” tab. You can see the result on https://intercoin.app

Basically turning any Discourse installation into a modern social network a la Facebook. We worked for years on those features and now we want to integrate them tightly with Discourse and Wordpress too. So people can combine Wordpress, Discourse and Qbix and self-host their entire community.

But I have two remaining issues.

  1. In Qbix, we hash the password on the client at least with sha1(password + userId) before sending it to the server. Even when it’s https. We do it so the server or any MITM NEVER has the password, to re-use it across sites. But, Discourse simply sends the password to the server. So we had to turn off this hashing on the client side. Is it possible to do some iterations of hash_pbkdf2 on the client side, and the rest on the server side? I tried it and it doesn’t seem to line up:
php > $password = 'abc';
php > $salt = 'def';
php > $a = hash_pbkdf2('sha256', $password, $salt, 64000, 64, false);
php > $b = hash_pbkdf2('sha256', $password, $salt, 1, 64, false);
php > $c = hash_pbkdf2('sha256', $password, $b, 63999, 64, false);
php > echo $a;
9d7a21ae4113bea06d81e0c486f45ab778bb739f19f7a6a305d8401918a9d8a1
php > echo $c;
f42af6861ebcf8560b027276e0d02ad46502636045486057d81be7c4c4aa630e
  1. Would it be possible to just use Discourse as an SSO Provider, instead of using our site as the SSO provider? Then hosts of Discourse forums would be even more likely to expand it with Qbix features, since the login would remain exactly the same, and on Discourse’s side. Facebook, Google, and whatever else. Is there any documentation on what kind of information Discourse Connect as an SSO Provider returns to our consumer site? Does it include things like the photo we can download, firstname, lastname, and username at least?

TBH I don’t think Discourse submitting a password over HTTPS is your largest security challenge at this moment.

Sure. I think you get most stuff in the standard user serializer.
But if that’s not enough you can always use the API to grab more information from Discourse.

2 Likes

TBH I don’t think Discourse submitting a password over HTTPS is your largest security challenge at this moment.

Cute. I see your sha1 and I lower you md5 :slight_smile:

I see why that pbkdf2 doesn’t really work to split it up… the problem is the first line:

U1 = PRF(Password, Salt + INT_32_BE(i))
U2 = PRF(Password, U1)
⋮
Uc = PRF(Password, Uc−1)

Any ideas for how to split it up, though? I guess I can just use the pure userland php library: pbkdf2/PBKDF2.php at master · Spomky-Labs/pbkdf2 · GitHub

I would recommend to Discourse to hash things with a salt (userId works) before sending the password over the wire. Why not? It doesn’t have to become incompatible with what you are storing in the database now. Simply do the first 100 iterations in Javascript, then subtract 10 from 64000. You have a custom implementation of it anyway (copied from rails) so you’d just send along an isHashed variable, and if it’s true, then do the “last” 64K-10 steps only.

The user ID is not known before login so that won’t work…

10 iterations are not secure, and 63990 iterations are less secure than 64000 iterations. So although it’s marginal it seems like you are replacing one secure method with two less secure methods and a lot of extra complexity.

And what’s the actual gain?

1 Like