-1

I'd like to test 3rd party (including "closed source") tools (like synchronization, de-duplication...) behaves in presence of files with same size and digest checksum (popular ones CRC32, MD5,SHA-1 ... etc). Some of those hashing methods have known vulnerabilities, so there exist ways of generating collisions.

Do you know about source of such datasets (other then brute-force try to create some :) ) or generators for creating such ?

To make clear about this : I am interested in sets of files with same checksum, file-size but different contents !

Grzegorz Wierzowiecki
  • 10,545
  • 9
  • 50
  • 88
  • Dictionaries, logfiles, sourcecode, anything goes... – wildplasser Aug 02 '12 at 19:45
  • I don't ask for "anything". I ask for collisions, as they are hard to generate. – Grzegorz Wierzowiecki Aug 02 '12 at 20:38
  • They are hard to generate because the functions are near-optimal and the key space is large enough. Remember the birhtday paradox: the chance of observing about *one* collision becomes about 1 once you have tested and hashed about sqrt(n) objects. For a 256 bit key that would be 2^^128 objects to insert. There are two possibilities: 1) reduce the keyspace (to say 32 bits) or 2) have a solid mathematical foundation. That's all you can do. – wildplasser Aug 02 '12 at 20:52
  • Exactly. But some of them, like md5, sha1 are broken. What you've said @wildplasser , are reasons why I am asking for dataset. – Grzegorz Wierzowiecki Aug 03 '12 at 23:26
  • Are you testing tools that you've built, or 3rd-party programs? – Adam Liss Aug 03 '12 at 23:45
  • 3rd part programs. My own tool have no such problems :). Thanks for pointing this out, I will put stress it out in question. – Grzegorz Wierzowiecki Aug 04 '12 at 19:16

1 Answers1

0

As we know about weaknesses of md5 :

In 2005, researchers were able to create pairs of PostScript documents[24] and X.509 certificates[25] with the same hash. Later that year, MD5's designer Ron Rivest wrote, "md5 and sha1 are both clearly broken (in terms of collision-resistance)."[26]

source : http://en.wikipedia.org/wiki/MD5

we can find there (on wikipedia) and in following SO topic:

Create your own MD5 collisions

example pairs.

The question stays -> about nice datasets, with much more examples. (Or nice generators).

Community
  • 1
  • 1
Grzegorz Wierzowiecki
  • 10,545
  • 9
  • 50
  • 88