News of the recent password leak at Yahoo! comes hot on the heels of similar breaches at LinkedIn, eHarmony and last.fm. These leaks are infuriating, not just because these companies got hacked in the first place, but because they failed to adhere to basic password security practices, such as hashing and salting. Although those sound like something delicious you do with potatoes, they are also basic ways of protecting your users’ passwords.
If I store your password as clear text, and someone gets in to my systems, they can read your password and go ahead and impersonate you online. Not a good idea. A much better thing for me to do is to store a hash of the password. A hash is a way of jumbling up text in a way that can’t be unjumbled. Now, if someone gets in to my systems, they can read the hashed password, but not the clear text password.
You may be wondering how the service you’re logging in to can identify you, if all they have is your hashed password? Easy: when you log in, the service takes the password you type and hashes it again, in the exact same way. If the two hashes are the same, the service assumes that your password is correct, and logs you in.
For example, say your password is
justinbieber (you know who you are…) Using SHA1 (a well-known hashing algorithm), your hash is
1cbd541e850adb792173ebbf6695a562fbfc1a7e, and that’s what we store in our system when you register that password. Then, when you log in, if you type in
justinbieber correctly, we’ll compute
1cbd541e850adb792173ebbf6695a562fbfc1a7e again and let you log in. But if you accidentally type
justinbiebert, we’ll hash that to
d5ed786f8b10bacdc41deb808f4ce5f2d7e881b6, which is not equal to the hash we stored, so we won’t let you log in.
Under this simple scheme, a service never needs to store your password. So the fact that Yahoo! did, at least in this instance, store clear text passwords is inexcusable.
Note that it’s possible for some other text to also hash to
1cbd541e850adb792173ebbf6695a562fbfc1a7e. Hashes are not unique. But the likelihood of someone typing in some other text that happens to have the same hash as your password is infinitesimally small: any other text that has the same hash as your password is almost certain to be a very long nonsense string of characters. So this isn’t a security concern.
I mentioned above that hashes can’t be unjumbled. That’s not quite true. Some hashes do have mathematical vulnerabilities that allow certain brute-force cracking methods to succeed. But even simpler than that, if I know which hash function a site uses for its passwords, and it’s usually SHA1, I can do this:
- Create a large set of possible passwords (say, all common first names, all dictionary words, all dates, all the above with numbers tacked on to the end and so on), let’s say I generate 100 million possible passwords this way.
- Hash all of them.
Now, if later on I get my hands on some site’s list of hashed passwords I can search that list for any of my 100 million hashes, and if one matches, I know what the original password was. Computers can search inside lists very quickly, even for many millions of hashes, so this cracking method is quite practical, and is one of the reasons why using predictable passwords is a bad idea.
This vulnerability of hashes shows that storing plain hashes of passwords, as LinkedIn were doing, is not enough. To increase security further, we can salt the hash. Salting means adding some long random string, known as ‘salt’, to the password before hashing it. As long as we add the same salt when you log in as we did when you registered the password, everything still works.
However an attacker cannot pre-compute the salted hashes of those 100 million passwords, because they don’t know what the salt is: ideally the salt is stored separately from the salted, hashed passwords. But even if the salt is discovered, the intruder has to then recompute the salted hashes for those 100 million candidate passwords before searching for them in the list of hashes. This takes extra time, time that can hopefully be used to change all the compromised passwords.
Salting and hashing are not difficult, and there is no excuse not to use them. If you’re storing unsalted hashes, not to mention plain-text passwords, you are compromising the security of your users’ accounts, and you need to take a long, hard look at yourself.