Upgrading common password prevention - Pwned Passwords v2

riking · March 1, 2018, 8:34pm

Troy Hunt has released Pwned Passwords v2 has part of his Have I Been Pwned service, which tracks data breaches: Have I Been Pwned: Pwned Passwords

The password checker uses a great privacy-preserving and cache-friendly design that’s dead simple to integrate. Blog post here: Troy Hunt: I Wanna Go Fast: Why Searching Through 500M Pwned Passwords Is So Quick but I’ll post a quick summary:

$ echo -n 'commonpassword' | shasum
bee858a53297f2feec01e084c3e110c296a7fd72  -
$ curl -sL https://api.pwnedpasswords.com/range/BEE85 | grep '8A53297F'
8A53297F2FEEC01E084C3E110C296A7FD72:91

therefore, ‘commonpassword’ has appeared 91 times in processed password dumps.

This offers great support to query a very large dataset without having to have a copy on every single Discourse site. As prior art, WordFence (a WordPress firewall plugin) has integrated it to block admin logins with weak passwords starting today (password resets are enforced on login):

Integrating this as an alternative to the 10k password list (many of which are moot due to length limits) seems like a good idea.

Discourse-hosted sites could use a local copy of the hash lists to avoid excess network requests, while self-installs would need to use the web service with custom caching.

Previous discussions: Min Password Length vs Block Common Passwords

sam · March 2, 2018, 12:35am

Sure we should start with a plugin first though, also it needs to fallback to our current 10k password list cause you never know with APIs.

codinghorror · March 2, 2018, 1:55am

Yeah I would rather have someone extract from that list the top 10k most common passwords that are 10 chars or more (the minimum allowed Discourse password length). If you would like to submit that as a PR Kane go for it, would be happily accepted.

riking · March 2, 2018, 1:57am

That requires re-finding the original data breaches and processing them. The database is distributed exclusively as SHA-1 hashes to make it hard to use it as a password spamming list.

We could probably ask Troy to produce a filtered list? Would be work for him though.

mpalmer · March 2, 2018, 2:01am

Hmm… a bloom filter bitfield on 500M elements with a false-positive probability of one-in-a-thousand (0.001) is, if I’ve done my arithmetic correctly, about 900MB. Certainly too big to ship in core, but might be suitable for those sites which don’t want to take the API call hit (for performance, stability, or privacy)? I’ve contacted Troy to see if he’d be able to provide a 10+ length list.

codinghorror · March 2, 2018, 2:02am

Sure go ahead and ask him. We cannot add a dependency like this, we need good lists of 10 char plus common passwords.

The good news is that once you get to 11, 12, 13 plus chars the number of duplicated passwords drops by many, many orders of magnitude.

mpalmer · March 2, 2018, 3:22am

… aaaaand Troy’s said nope. So we’re back to hammering his web service, I guess.

codinghorror · March 2, 2018, 3:26am

Nah we don’t need his help. If he is unwilling to provide, we will find common aggregated password sources elsewhere.

I guess I will just close this topic then @riking?

codinghorror · March 2, 2018, 3:34am

Also you misunderstood this, the 10k list we have and distribute is the 10k most common passwords of 10 chars or more. So it is quite concentrated.

mpalmer · March 14, 2018, 6:53am

So, I had a bit of a brainwave: if ol’ Troy Boy doesn’t want to share a filtered list that meets our minimum length requirements, what if we instead just cracked the hashes on all the short ones, leaving us with just the hashes of the big passwords to reject? We can reject in bulk – any < 10 character string which hashes to a listed value gets tossed, and we can do the comparison in bulk as we go.

Unfortunately, while a basic AWS g2 instance has a fairly beefy GPU available, it’s still going to take a long time (like 50 days or so) to enumerate all the 8 character combinations. Worse, because the list of candidates is so big, you can’t load them all up in one go (you run out of GPU memory), but instead have to split them into chunks (I used the first character of the hash to get 16 buckets), so you’re either going to have to run 16 instances in parallel, or wait 16x as long (which costs the same amount of money, if you stick to using AWS).
That’s just the 8 character passwords, too – 9 characters is going to take significantly longer (less characters takes significantly less time, too, to the point where it’s noise at 6-or-less).

So, unless someone is feeling particularly overloaded with money, or we just want to filter out the really low-hanging fruit, I guess this little experiment was a bust. Pity, would have been a nice way around the problem.

riking · March 14, 2018, 7:29pm

You’d also need to make sure that the k-differential privacy still holds at the end after all the removals - that a 5-char hex prefix still results in minimum 1 matched hash per bucket.

edit: oops replying to closed topics again

mpalmer · March 14, 2018, 10:12pm

I was thinking of using the data in a bloom filter, to ship with Discourse (probably as an aftermarket “enhanced password security” setting), rather than as an online service. I don’t hold with the idea of putting a third-party service in the middle of local logins.

Topic		Replies	Views
Pwned Passwords Validator Plugin	37	4578	October 14, 2019
Min Password Length vs Block Common Passwords Feature	29	5391	August 7, 2020
Is there a way to manually locate user's passwords? Support	23	5080	October 19, 2021
Unique characters in password - good or bad? Community	22	6186	December 4, 2017
Password validation for variations on username Feature	9	5865	October 15, 2019

Upgrading common password prevention - Pwned Passwords v2

Related topics