This shows you the differences between two versions of the page.
people:solar:unique-password-count [2021/03/04 13:23] solar [Pwned Passwords (HIBP)] dropped the hypothesis as it's now known to be wrong per Troy Hunt's clarification |
people:solar:unique-password-count [2021/03/04 14:05] (current) solar [Pwned Passwords (HIBP)] added total vs. unique for RockYou-sized subsets of Pwned Passwords |
||
---|---|---|---|
Line 96: | Line 96: | ||
===== Pwned Passwords (HIBP) ===== | ===== Pwned Passwords (HIBP) ===== | ||
- | [[https://haveibeenpwned.com/Passwords|Pwned Passwords]] is a curated and regularly updated collection of leaked/breached plaintext passwords redistributed in form of SHA-1 and NTLM hashes for the purpose of detecting and preventing password reuse. It was introduced in August 2017 (many years later than the above analysis of RockYou). Version 7 released in November 2020 contains over 613 million (613584246) hashes of unique passwords. Helpfully, included with each hash is "a count of how many times that password had been seen in the source data breaches." Adding those up yields over 3.65 billion (3650716681). | + | [[https://haveibeenpwned.com/Passwords|Pwned Passwords]] is a curated and regularly updated collection of leaked/breached plain text passwords redistributed in form of SHA-1 and NTLM hashes for the purpose of detecting and preventing password reuse. It was introduced in August 2017 (many years later than the above analysis of RockYou). Version 7 released in November 2020 contains over 613 million (613584246) hashes of unique passwords. Helpfully, included with each hash is "a count of how many times that password had been seen in the source data breaches." Adding those up yields over 3.65 billion (3650716681). |
- | Extrapolation from RockYou using the formulas above gives 795 to 1225 million unique, with mean for the four formulas at 935 million. This is the opposite from what we saw in the Adobe leak - Pwned Passwords appear to be significantly worse than RockYou's (fewer unique). | + | Extrapolation from RockYou using the formulas above gives 795 to 1225 million unique, with mean for the four formulas at 935 million. This is the opposite from what we saw in the Adobe leak - Pwned Passwords appear to be significantly worse than RockYou (fewer unique). |
+ | |||
+ | To confirm that it's indeed Pwned Passwords being more repetitive than RockYou rather than the extrapolation failing at these numbers of passwords, let's take the first 14344391 lines (same as RockYou unique password count) from pwned-passwords-ntlm-ordered-by-hash-v7.txt and add up the counts on those. Turns out they correspond to 81548722 original passwords (including duplicates), which is 2.5x higher than RockYou's. (Going with the last 14344391 lines instead gives 84209109, which is similar enough. Ideally, we'd shuffle the file first, but since there's no reason to expect password complexity or number of occurrences in a plain text leak to correlate with NTLM hash value, these shortcut approaches work just as well. Sorting by hash value effectively //is// random shuffling of the password counts.) | ||
+ | |||
+ | Going the other way, it takes about 5.6M lines from pwned-passwords-ntlm-ordered-by-hash-v7.txt, which is about 2.5x lower than RockYou's unique password count of 14.3M, to achieve RockYou's original password count of 32.6M (including duplicates). | ||
===== Perl script ===== | ===== Perl script ===== | ||