A Primer on Cryptographic Hashing

Cryptographic Hashing

This article is a brief explaination of cryptographic hashing algorithms. If you’re looking for a hash generator, you can find that in the tools section.

One of the more common aspects of cryptography is that of cryptographic hashing. A hash is simply a one-way encryption of data, whether it be a string (“Hello World”) or a file. One-way encryption means that the data (plaintext) theoretically cannot be determined from the hash (ciphertext). In practice, however, this isn’t necessarily true – but there is no mathematical formula or operation that can reveal a password from a given hash. (Seecracking hashes, below)

I’m not going to get into the technical details of the various hash algorithms here, since it’s really not that important and the average reader probably doesn’t care anyway. What are important are the various differences between the myriad algorithms in use.

MD-5 (Message Digest 5)

MD-5 is one of the early hash algorithms, and is still in common use today. MD-5 produces a 128-bit (16 bytes or 32 hex digits), well, message digest from a given “message” (data). No matter how long or how short the message is, it will ALWAYS result in a 32-digit hash. As an example, a few strings and their respective hashes are given below:

null d41d8cd98f00b204e9800998ecf8427e
Hello World b10a8db164e0754105b7a99be72e3fe5
HelloWorld 68e109f0f40ca72a15e05cc22786f8e6

As you can see, even a small change in input (removing the space between “Hello” and “World”) results in a large change in the output. (Mathematically, this is known as the Butterfly Effect, although the original analogy referred to a Seagull.) There are an extraordinary large number of possible MD-5 hashes, but there are an infinate number of inputs. This leads us to our next bird-themed mathematical principle: the pigeonhole principle. Essentially, if you have x number of inputs and y number of possible outputs, and x > y, eventually two or more items from set x are going to fit into the same y value. In other words, if I give you five apples and three baskets, and tell you to put all the apples in the baskets, at least one basket will have more than one apple in it.

So what does this mean for us?

If two inputs can have the same MD-5 hash, then it’s theoretically possible to trick any system that relies on verifying hash values. For example, most password scripts don’t actually compare your password. Instead, they compare them indirectly by means of comparing the hash stored in the database with the hash of the typed password.

Password Hashing

MD-5 hashes are also commonly provided to check the integrity of downloaded files. Suppose you download a file with a given hash, then verify it. The verification comes back positive – that is, the hash of the file is exactly what it’s supposed to be. Nobody tampered with the file, right?

Not exactly.

It’s been proven (add citation later) that a file can be modified to include malicious code, yet still have the exact same MD5 hash. This is just a mathematical possibility, of course, and probably does not hold true for every program out there (or even most programs, for that matter). The bigger security risk (in my opinion, anyway) are the vast majority of users who don’t check the MD5 checksum to begin with.

Cracking Hashes

Back to passwords – the question always comes up from some script kiddie or security-conscious consumer: “Can a hashed password be reversed?”

Technically, the answer is no. It is 100% impossible to reverse-engineer a hash, due to the way they’re calculated. “Given an MD-5 hash of a password, can I find the plaintext password?” is a more appropriate question, and the answer to that is “Of course you can!”

I’m not going to get too involved in cryptanalysis here, but there are three basic ways to crack hashes of all shapes and sizes.

Brute Force

This method is simply exhausting every possible combination until you find the correct one. You’ve probably done this at some point in your life when you’ve forgotten the code to your combination lock, and sat there trying 001,002,003,004… If you didn’t just give up and cut the lock off, it probably took you a good portion of the afternoon.

Fortunately, computers (even basic desktop machines like you probably have at home) are able to process a couple million combinations each second. Unfortunately, the huge number of possible combinations used in modern cryptographic algorithms would require, on average, something like 100 times the life of the universe to exhaust every single possible combination. In reality, however, the correct arrangement lies somewhere in between. You may get lucky and crack it in a few minutes, or it could take you several weeks. (Also consider that, statistically speaking, “fast” could also mean 400 years).

In short, brute force methods aren’t the most effective way to go about cracking a hash. In select circumstances, and with the right software, brute force can be a great supplement to other attacks. Cracking NTLM passwords, for instance.

Dictionary Attack

In it’s most basic form, a dictionary attack simply consists of gathering the hashes for every word in a “dictionary”, or a list of pre-defined words/phrases. Numerous sites exist that have fairly large and exhaustive dictionaries, a Google search will turn up some examples.

If you know your target, or are trying to remember a lost password, you have an advantage: you may be able to figure out the hash even quicker by using a custom dictionary.

Recently, I was able to acquire the hashed passwords for all the users of our network at work. Within 15 minutes, I had close to 80% of the hashes cracked using a custom dictionary attack, supplemented by brute force in a few instances (including all but one of our administrators). The other IT folks were notified, and that gaping security hole is now fixed when the server was upgraded. :)

This is why it’s important to use complex passwords – upper/lowercase letters, numbers and symbols. Longer passwords are obviously preferable over short ones, because of the exponential growth of the keyspace (possible number of passwords).

Rainbow Tables

The final method I’ll discuss here are rainbow tables, which are basically pre-computed hash tables. The key difference between a dictionary attack and a rainbow table is that during a dictionary attack, the computer has to:

Read Word/Phrase >> Convert to MD5 Hash >> Compare Hashes

In a rainbow table attack, we eliminate the middle step (which takes a fair amount of computing power in comparison to the read or compare steps), thus speeding up the process. It may not seem like much, but shaving even 1/10 of a millisecond off of each comparison adds up, espicially at speeds of 2-3 million comparisons per second.

Salt

Cryptographic salt is just one of those terms, and I’m not sure where it came from. Cryptographic salt is used to change the input in a standard, replicable way, so that the output is different for each computation.

Let’s go back to the list of password hashes I acquired. I immediately noticed that a handful of the hashes appeared to be the same. Then, I looked at the users associated with the hashes, and noticed that they were the part-time/casual folks who hadn’t been around in a while. A little bell went off in my head: “I doubt all of these people chose the same exact password… I’ll bet they just haven’t changed it from the default password that we give to new accounts!” So I typed our default password into the hash calculator, and bingo! Several passwords cracked at once!

Let’s take this a step further: Suppose someone acquires a list of password hashes from a given site. After cracking one password, he notices that several other accounts use this same password too. He now has access to not only several people’s accounts on this site, and it’s also likely these people use the same password for another site.

Same Password - Same hash

But let’s add some cryptographic salt into the process – instead of comparing md5(password), let’s add another known constant (known to us, anyway). Date of birth, e-mail address/username, phone number, last login time (password hash needs updated each time with this, or it won’t work). For this example, I’ll chose date of birth.

md5(secretpassword) = 2034f6e32958647fdff75d265b455ebf md5(secretpassword12071941) = 6db39407ab20f21c4d3e12e8611fa42amd5(secretpassword04171969) = e24f110ad60e6bf2b80f8e3ca80e0441

As you can see, even though both users have the same password (“secretpassword”), the hash for each person is different, because it’s salted with the date of birth.

A dictionary attack probably wouldn’t find “secretpassword12071941” in it’s list of common words and phrases. Some dictionary attack software will automatically append numbers, letters, words, etc to each dictionary phrase (password,password1,password2,etc), so simple salts (single-digit numbers, common usernames, etc) should be avoided.

Assuming the hacker manages to crack one of your salted passwords, he would be given the string of the password+salt. In the instance above, the hacker would get the string “secretpassword04171969“. If he tries to use this as a password, several things will happen:

  1. The computer will add the salt string (04171969), and hash the final string of “secretpassword0417196904171969“, and then compare the hashe24f110ad60e6bf2b80f8e3ca80e0441 with ab432c3885f9d05089fd7f10e4878f31, and the login attempt will obviously fail.
  2. After the failed login, he may either assume the user has changed their password and give up, or realize it’s salted and start trying:
    • secretpassword0417196
    • secretpassword041719
    • secretpassword04171
    • secretpassword0417
    • secretpassword041

(or, maybe logically, he’d start off with “secretpassword”). Either way, this is one of the reasons it’s important to implement a “max tries” feature in your login script. He’s already tried one, and now he’d only have two more shots to correctly guess the password.

Other Hashing Algorithms

The MD-5 algorithm is considered to be “broken”, meaning that some really smart folks out there have found a way to trick it. So it’s not very secure, and shouldn’t be used for high-security tasks.

There are many different algorithms out there, and they all perform the same function: one-way encryption of data. In PHP, the SHA-1 (Secure Hash Algorithm 1) is also commonly used.

Real-Life

PLEASE, PLEASE, PLEASE do not use MD5, SHA1 or any of the other algorithms if you’re implementing a web application. Assume they will be stolen. While you probably could write a hashing function (not a new algorithm, but a way of salting/storing/retrieving passwords), there are already excellent libraries out there for you to use. Trust the pros, don’t go this on your own. If you’re using PHP, Bcrypt is the way to go.

A Primer on Cryptographic Hashing

Leave a Reply