0

There's many MD5 collision examples out there on the web, but as far as I can find, they all involve binary data inputs.

Are there any two known plain-text ASCII strings that give the same MD5 hash?

Basically I'm building a de-duplication system that stores plain text files such as JSON + XML, but it skips over any files that are detected to contain binary data. And I need a way to test how the system copes with two plain text (non-binary) files/strings that give the same MD5 hash.

LaVache
  • 2,372
  • 3
  • 24
  • 38
  • Out of curiosity, is there any reason you're using MD5 rather than a more secure hash function like SHA-256? – templatetypedef May 08 '20 at 01:13
  • @templatetypedef Yes. Initially I was actually doing that (combo of CRC32+MD5+SHA512). But changed to simply using MD5 for few reasons: (1) Using postgres with efficient `UUID` type as primary/foreign-keys. (2) Lots of other systems already generate MD5s for files. (3) Simplicity. (4) Performance. The combination of these benefits have been great since I made the change. The system is programmed to simply reject 2nd+ colliding files, which is an acceptable compromise in my use case considering the upsides. I just need a way to test my rejection code using plain text inputs. – LaVache May 08 '20 at 01:36

0 Answers0