76

I've heard that MD5 is "broken" (in the context of password encryption). But I don't understand why! I've read the theory, but can't see it happening in practice...

I have an MD5 hash 99e9446e78aac2056d3903e1adb8fbcd
And a simple bit of code to produce it

$salt="#bh35^&Res%";
$pass="***"; //number of characters is not equal to number of *
echo $hash=md5($salt.$pass);

So, my question is:

Is MD5 really that bad? And if so, what's the password behind the asterisks? (or another word which will produce the same hash)

I'm looking to get past theoretical reasoning, and come up with a simple example that we can use to demonstrate the problem(s) with MD5.

EDIT: I have changed the pass to make it invulnerable to dictionary search.
So, the hash got changed as well.

FINAL: Well, because I was forced to accept something, I accepted the most funny answer.
The pass was s4mep8ss

153

Will that be proof enough (code included):

Peter Selinger: MD5 Collision Demo

Update: Comments claim that this answer is not addressing preimage attacks (which the question didn't ask about at the time of writing this answer). I would like to add that there also have been preimage attacks against MD5 (e.g. confer Finding Preimages in Full MD5 Faster Than Exhaustive Search). They are not feasible yet, but "Attacks never get worse, they only get better".

A nice summary on the current status of MD5 can be found at http://www.jcsecurity.co.uk/?p=215. The article concludes:

In conclusion, MD5 has ?decent-enough? preimage resistance but has some serious collision vulnerabilities which may or may not effect the application it?s deployed to. Even though MD5 might be safe under certain scenarios, unless you absolutely have to use it for reasons such as backwards compatibility or interoperability don?t. There is no point and as Schneier always says ?Attacks never get worse, they only get better?. There are numerous alternatives available today, SHA256, SHA512 and Whirlpool for example are all sensible choices.

Update 2: There is another problem with MD5 in the context of being used as a password hash function. MD5 is fast. Although this article is already more than 2 years old, it still holds today:

Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes

As mentioned by ShuggyCoUk, this question is in essence a restatement of Does any published research indicate that preimage attacks on MD5 are imminent? I recommend to read on there.

111 accepted

Your password is hunter2

36

To address your points directly:

1) Yes, many collision vulnerabilities have been documented. The bottom line is that MD5 is fundamentally broken. According to Wikipedia:

US-CERT of the U. S. Department of Homeland Security said MD5 "should be considered cryptographically broken and unsuitable for further use," and most U.S. government applications will be required to move to the SHA-2 family of hash functions after 2010.

Bruce Schneier has also weighed in with regard to one such attack on MD5 to forge SSL certificates:

I'm not losing a whole lot of sleep because of these attacks. But -- come on, people -- no one should be using MD5 anymore.

2) Its impossible to determine what is behind the asterisks. The point of "breaking" MD5 is not to determine the original password value - it is to find a collision. If I can find another value that will hash to the same MD5 string, then that is as good as finding your original password. Of course, salting makes MD5 stronger, but it is still "broken".

22

There seems to be two ways in which MD5 can be "broken," not at all equivalent:

  1. It is easy to find two inputs X and Y such that md5(X) == md5(Y).

  2. Given md5(X), it is easy to find a Y such that md5(X) == md5(Y).

It's my understanding that md5 has vulnerability 1, but not 2. Although I see why vulnerability 1 might be undesirable for some applications (e.g. digital signatures), the original poster is using it for password authentication. Why is md5 "broken" for this application?

12

md5() is broken due to possibility of a malicious person to intentionally generate a collision. This is more of a problem for Certificate Authorities and the Public Key Infrastructure (PKI) than passwords. By generating a hash collision its possible to forge a SSL certificate.

Further more a collision against md5() is generated by using a specially crafted prefix, if you append a salt to the beginning of the message then an attacker cannot control the prefix and a collision cannot be generated.

However my biggest argument is that it is trivial to use a secure message digest function. You can install mhash or use one of the many sha2 php implementations.

9

What you have done here is epitomized a train of broken thinking into a question that seemingly challenges a logical fallacy.

In fact, your question is as much a statement as it is a fallacy.

If a one way hash collides, even once, don't use it, period, for the purposes that you are describing.

I really, really hope that you don't work for my bank.

7

Finding a collision means I find two different inputs that result in the same hash.

Finding a preimage means finding an input that results in a single, specified hash. I.e., you hash something, and I know only the hash, and my challenge is to find a second input that results in the same hash as output.

The currently known attacks on MD5 allow generation of collisions, but do not allow the generation of a second preimage.

On the other hand, the fact that MD5 has been broken to the extent necessary to find collisions easily1 gives a fairly strong indication that finding a second preimage is much more likely than with another hash that hasn't been broken to the same degree. It's possible that nobody will ever develop a preimage attack against MD5 -- but then again, it's also possible that somebody already has, and is putting it to use for their own nefarious ends rather than publishing it.

  1. There's a huge difference between the breaks of MD5 and SHA-1: the break against MD5 makes it trivial for almost anybody to create colliding inputs. If you have an old (e.g. Pentium IV) computer lying around mostly unused, it's sufficient to find colliding inputs at a rate of something like one pair per hour or so (and the "or so" mostly translates to "or very likely faster"). The break against SHA-1 is right at the border between theoretical and practical. As close as I can figure it, you'd need to spend something like a million (US) dollars to build a machine that could find one collision per week.
5

Assuming a password of a maximum of 20 characters and an alphabet of Latin characters and numbers, there are as many as 5e+35 possible strings.

I wrote a very simple program that attempts to generate every possible string and then compares the salted hash against the one you presented.

It will take roughly 16,1e+22 (16+22 zeros) years to calculate all hashes with this program. It's a lot of time, of course, so yeah, it seems to you that your hash is unbreakable. However, now, remember that there are collisions in MD5 hashes, right? So for every two strings that generate the same hashs, you can reduce the total number of calculations in one. If it collides a lot -- then maybe we can find your pass in a feasible time... one week, maybe?

I will try running the application in my home desktop, which has more processing power and maybe multi-thread it, even though I surely won't left it running for a week.

UPDATE: Recalculated the total of years after improving the algorithm: 15e13 years, way below the last estimative. That's with a single thread (actually, two threads, one to do the calculations and another to give status reports every 30 sec...). Tomorrow I might try creating a thread model for this app.

4

Using a single iteration of a simple hash, even with salt, is entirely inadequate for password hashing, as they're entirely too easy to brute-force even with a hash that isn't broken. Use a proper hash-strengthening scheme or key derivation function such as bcrypt, scrypt, PBKDF2, or Glibc's iterated SHA-2 family hashes.

3

You are using salted hashes, which is good, because it makes it nearly impossible to do a simple reverse-lookup on a hash, even for common words. MD5 has a higher chance of collisions versus SHA1 though. This means that it's more likely that you'll get the same hash for 2 different strings with MD5, although it's still extremely unlikely. However, sha1() is a drop-in replacement for md5(), so why wouldn't you use it?

3

The weakness isn't that you can discover the value being hashed. The weakness is that if you know X and MD5(X), it's possible to construct a Y such that MD5(Y) == MD5(X). This means that it's possible to forge a message to match a signature. On top of that, it has a higher rate of collisions than other, stronger algorithms.

In general, you should use SHA-1 or better.

Read this post for more information.

3

You probably don't need to worry about it. The problem with MD5 is that somebody may be able to find another message that hashes to the same value, which only matters in cases where the hash is visible. This is a big problem with message authentication where you publish a message and say that it is valid if it has a specific MD5 hash. If I can find another message that has the same hash, I can pass it off as valid.

In your case, however, it looks like you're hashing passwords. If you keep the hash value private, there's no way to create a collision, so you'll be safe.

0

I'll come at this a different way. A lot of companies use MD5 hashes to "sign" their files, or assert that a file with a given hash is unique from another file. They base their entire systems on this, especially with respect to file deduplication, or single instance storage.

Now given the fact that you can have different files with the same hash (see these examples), what possible faith can you put into a system that asserts that no two files exist with the same MD5 hash?

Edit to answer comments:

Let's assume a few things, to take it out of the context of the mathmatical realm, and place it into the context of the original question "Why is the use of MD5 bad in practice?"

Say your company is involved in litigation, and the opposing party demands any and all documents relating to "X". You go and buy some software that will crawl all your storage locations and caltalogs the billions of files and emails and attachments, generating an MD5 hash for each. You then exclude all "duplicate" files based on the MD5 hash, and produce the rest of the relevant documents to opposing consul.

Now say the opposing counsel is a bit of an "enthusiastic" litigator, and wants to cast doubt that your company actually met its obligations, specifically in the trustworthiness of using MD5 as a deduplication mechinsim. The opposing pary is going for your company's throat, wanting the judge to impose some hefty sanctions, or even a summary judgement.

So if you were to go in front of a court in a litigation setting, where your company was under penalty of such sanctions, your defense woud be, yes, using MD5 is fine, because:

You need to distinguish among the cases that

  • (a) hash collisions can happen (albeit with extremely small probability),
  • (b) two files can be intentionally constructed to cause a collision (this is a "collision attack", it's possible with MD5),
  • (c) an arbitrary file can be intentionally constructed to cause a collision with another file (preimage attack, not known for MD5)...
  • (d) that a hash collision w/o intentional construction of files is likely to happen (which is not true... you'd need approx 2^64 different files to have a likely collision in a 128-bit hash.)

To which the litigator would likely respond:

  • (a1) Is it possible that two different files can have the same MD5 hash?

    (your answer would have to be yes)

  • (b1) Do you know if there are any examples of two different files that have the same MD5 hash?

    (again, your answer would have to be yes)

At this point, you have lost support in the eyes of most judges. It is now up to your legal team to steer the course back onto the "MD5 is fine" track. I'd rather not be in that position in the first place. At least with SHA-256 or other longer hashes, you can answer "No" to (b1). And thus, the whole point to the question: "Why is the use of MD5 bad in practice?"

0

The first step is admitting the problem which you are doing by writing this question. That's good.

Now step slowly away from the broken algorithm and use one of the many fine alternatives. If you really want to be forward looking, use one of the new candidate hashes submitted to NIST such as Skein.

OTOH your simplest and most widely implemented bet is probably just SHA-2 at 256 bits. That's a wonderful middle ground of size and strength. I'd advise against using SHA-1 at this point as it will be the next to fall. New code should anticipate the next move.

WRT MD5, that has pretty much been answered by others. Safe to say major X.509 certificate implementations still depend on that old beast, but anyone writing new code should use something that isn't known to be broken. The harder question is how to get rid of MD5 from old code, it's sort of like the year 2000 problem except it has no end date so it ends up just being a multi-decade security flaw happening over and over again.

0

Collision space aside, MD5 is a fast hashing algorithm - if I want to brute force it (even with salting) then it takes less effort to do than an equivalent in SHA1, SHA-256 or the like.

Your objective is to make it computationally expensive and slow to generate collisions; using a unique salt per value, and using a slower hashing algorithm, both work towards that goal.

0

Three words: Fixed prefix attack.