Sleeping Cyborg

Jonathan David Page talks about whatever he happens to be thinking about. Sometimes other people join in.

Email · @parathetic (Twitter) · @jdpage (Github)
Subscribe to feed


A collection of cool people and projects.

Password Storage

by on 8 May 2011
in ,
with some comments, maybe.

EDIT 2012-01-29: As it happens, the information in this article relating to the use of SHA2 as an appropriate password hashing algorithm was incorrect. I've replaced it with accurate information. The original text can be found in the footnotes. See Password Storage 2: Electric Boogaloo for more information.

So now that school is mostly over, I'm going to use this blog for what I originally intended--namely, talking about programming. This was originally going to be a devlog entry, but it mysteriously turned into an explanation of password storage instead. It also gives me the opportunity to make rude comments about Sony like all the cool bloggers, because they were very bad at password storage.

I was working on authentication for our project with Prof. O today, and while I was waiting for the development environment to load, I typed this out. Last night I got the passhash and salt fields set up, and did some general research. The fruit of my research is a hopefully relatively simple explanation of how password storage is done.

It seems pretty obvious to do this: just store the username and password. You're done! Of course, if an attacker gets your database (cough Sony cough) and your users used the same password for their email address (doubtless stored nearby), a dismally common practice, then they're in a bit of trouble, aren't they?

The accepted solution to this is hashing the password before storing it. Hashing is done by a hash function, which takes an input and produces a corresponding, often shorter, output, called a hash. However, it's one-way; ideally, you shouldn't be able to get the input back given the output. MD5 and SHA1 are common hash functions used for integrity checking1; for password storage, bcrypt is an appropriate hash function.2

The other thing about a hash is that, while many different pieces of data can have the same hash (SHA-512, one of the largest hashes, only has 512 bits; one sixteenth of a kilobyte). However, the probability of two sensical values having the same hash is vanishingly small.3

Of course, attackers have got a way to combat this: rainbow tables, which are simply a massive list of all possible passwords matched to their hashes. Do a lookup of the hash on the table, and bang, you have the password (after a few minutes; these tables are absolutely massive, and take a while to search through). This is quite clearly not a good thing at all, so we do one more thing to protect the passwords--salting them. Basically, this means attaching a piece of random or pseudorandom data called a salt to the password before hashing it, and storing the salt along with the password hash. This does two things: firstly, it means that even if two passwords are the same (assuming they have different salts), the hashes will be different (meaning that if an attacker breaks one, the other is still safe), and secondly, it can magically make many rainbow tables completely useless, by making sure the password+salt combination is not likely to be on the table.

To authenticate, simply hash the password and salt together as you did to store it, and compare against the hash in the database. If they're the same, the password was correct. If not, it was wrong.

The bcrypt hashing algorithm actually has built-in support for salts -- you have to pass in both a plaintext and a salt.4

Finally, it might simply be best to avoid attackers getting hold of your database in the first place. However, in the interest of mitigating damage and multiple layers of protection, good password storage is a must.5

  1. "used for integrity checking" added on 2012-01-29 to clarify the appropriate purpose of the algorithms. 

  2. Edited on 2012-01-29. Originally read "for cryptography purposes, I prefer SHA-512 (a form of overkill SHA2)." MD5, SHA1, SHA2, etc. are not appropriate for password storage due to the fact that they are designed to be computed quickly, facilitation brute-force attacks. This makes them suitable for certain tasks, but password storage is not one of them. 

  3. The following text was deleted on 2012-01-29:

    Say your password is "@b3L1nc0lnR0%". The SHA-512 hash of this is:

    2f66 9619 ffc8 49a3 5049 a0f4 b050 1fa0 880f 05b4 13cf e494 c2e1 c941 3c0f 5e47 0fb8 81be 9d51 6571 5e27 c525 1076 e906 72e2 dd59 d615 c0c5 d9fc 6d6c d098 8feb

    Now, the other pieces of data that match that will probably not (and by "probably not" I mean "practically never") be appropriate passwords. They will probably be 200 pages of garbled bytes."

    It was deleted due to redundancy and the fact that use of SHA512 was misleading. 

  4. Paragraph added on 2012-01-29. 

  5. Deleted following paragraph on 2012-01-29 due to excessive smugness. It originally read: "And really, it's not that hard to implement. Most standard libraries make this dead easy."