How was SHA-0 broken? - What is the significance of a mere handful of hash collisions?

Question

I wanted to understand how the SHA0 hash function was broken. I understand that utilising the birthday problem/pigeon-hold principle, hash collision(s) were found. http://www.mail-archive.com/cryptography%40metzdowd.com/msg02554.html contains an example message.

What I’m having trouble finding/understanding: Does this mean there is a timely, mathematical way to ALWAYS produce a hash collision?

Can I eventually find a m2 for a given m1 such that m1 != m2, sha(m1) == sha(m2) or is it only possible on a subset of possible messages? Rephrased: Are the chances of my password having another message for a collision guaranteed?

What is the significance of finding 2 random long messages such as in the link above that have the same hash value? Why did they have to sift through long random messages for a collision instead of figuring a collision for a practical message like “The brown dog jumped over the fox” ?

A couple examples of hash collisions don’t seem as important as a timely method to generate a collision for any message, but all the posts talk about the former.

Thanks for any help/your time! I've read alot of posts/articles, but can't work my brain around my confusion. I suspect I have the same questions for other broken hash functions like MD5.

EDIT:

The paper (explaining improved method for finding collisions) referenced in the answer

That mail is not more than a proof-of-concept of an attack on an old version of SHA that has been deprecated due to several known weaknesses which allow for an attack that is far easier than exhaustive search. Said attack took 80,000 CPU hours. You should note that the same attack won't work for the common SHA-1 implementation. And, to answer your question: Yes, this means that one can in principle find a collision for every input, given a cluster of powerful workstations, but _only_ with SHA0. — Damon, Jul 07 '11 at 15:48
(generally you can _always_ find a collision for _every_ input on _every_ hash, given infinite time/resources, the only notable difference here is that this particular attack on this particular hash works with comparatively moderate resources) — Damon, Jul 07 '11 at 15:49

score 8 · Accepted Answer · answered Jul 07 '11 at 15:49

8

From Wikipedia:

In February 2005, an attack by Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu was announced which could find collisions in SHA-0 in 2^39 operations.

That kind of complexity is completely insufficient for cryptographic purposes with the computing power currently available. It guarantees the discovery of a collision for any message in a very reasonable amount of time.

answered Jul 07 '11 at 15:49

thkala

84,049
23
157
201

Thank you for referencing this, I'll take a look at their paper. – dsf Jul 07 '11 at 16:11
4

"It guarantees the discovery of a collision for any message..." I think this is wrong. The paper describes a "collision attack". This means they can find some m1, m2 such that h(m1)=h(m2) in 2^39 operations. It does not mean that for ANY m1 they can find m2 such that h(m1)=h(m2). That would be a special type of "chosen-prefix collision attack". – tba Nov 24 '15 at 02:01

How was SHA-0 broken? - What is the significance of a mere handful of hash collisions?

1 Answers1