1

I'm having a heck of a time trying to understand this. How does, for example MD5 or SHA1 hash a string and only return alphanumeric characters. I mean, if I'm not entirely off, they both, or atleast MD5, converts the string to binary and then appends whatever's missing to be able to chop it up in blocks of 512 chars. It then does a series of operations, one of them being to XOR one of the 32 words of each chunk. I mean, this cannot be pure luck to just get alphanumeric characters in the end, the XOR must produce something else?

Could someone explain to me, and or even provide a small example where someone XOR's a string in java or php?

user1768788
  • 1,265
  • 1
  • 10
  • 29
  • 1
    http://tools.ietf.org/html/rfc1321 and http://stackoverflow.com/questions/997284/how-does-md5sum-algorithm-work – Mark Baker Sep 11 '13 at 14:29
  • Hopefully I didn't misunderstand, but md5 always returns 16 bytes - which can be expanded to a 32 length string expressed in hex. – Dave Chen Sep 11 '13 at 14:31
  • They don't. What you usually see is the binary result encoded in hex (or sometimes base64). – kiheru Sep 11 '13 at 14:31
  • No matter how big an integer is, you can write it it in decimal notation using just the digits 0–9, or in binary using just 0 and 1, or in hexadecimal using 0–9 and a–f. – Joshua Taylor Sep 11 '13 at 14:32

2 Answers2

2

Generally when being displayed to a user the output of a hash is displayed encoded as a hexadecimal string, sometimes you might also see a Base64 string, but this is more rare.

The output of an MD5 hash is a 16 byte (128-bit) value . The full range of values a single byte can have can be shown as a two digit hexadecimal value. This hexadecimal value can range from 00 to FF which in decimal is 0 to 255 or perhaps more clearly in binary 00000000 to 11111111 (eight bits in a byte).

So 16 bytes can be represented as 32 hexadecimal digits without losing any information, and has the advantage of being easy to compare by visual inspection.

EDIT:

Another source of confusion may be in your premise that hashes work on alphanumeric data, which is incorrect. Hashes such as MD5 do not operate on strings, they operate on arbitrary data.

When you hash a string it is hashed as data using an encoding such as UTF-8. For instance, the representation of hello in UTF-8 is 01101000 01100101 01101100 01101100 01101111 in binary or 68 65 6C 6C 6F in hexadecimal. That number is the actual input to the hash.

Dev
  • 11,919
  • 3
  • 40
  • 53
  • Okay but still, if the output is displayed in hexadecimal. Then from what I read on the wiki, how does the algorithm make so whatever binary stuff it gets, always becomes alphanumeric? – user1768788 Sep 11 '13 at 14:51
  • @user1768788 I updated my answer to clarify. To answer you directly, hashes work on arbitrary data, not necessarily alphanumeric. – Dev Sep 11 '13 at 14:59
  • I might've been unclear with my question, let me demonstrate: You have your string which you want to hash, for example, a question mark (?), in binary, that is: 00111111 Now with MD5 they have different values to run different operations, lets say your value is "b" (01100010) and your are doing XOR: 00111111 01100010 -------- 01011101 This new binary value is equal to ], I don't get how it can't get a result like that somewhere. I can't get this text to have spaces, so look here for clarification: [HERE](http://pastebin.com/9WA4qy7h) – user1768788 Sep 11 '13 at 15:06
  • @user1768788 The output of a hash is not character data, it is just a number, a very large number (in particular for md5 the output is a 128-bit number). If that number happens to match some character encoding at certain byte positions that is a simply a coincidence. – Dev Sep 11 '13 at 15:22
  • it's a number yes, which is converted to a string, am I correct? how come, this string is NEVER anything else than alphanumeric? – user1768788 Sep 11 '13 at 15:26
  • 1
    @user1768788 The number is not and should not be decoded as character data. Most programs will simply display the hexadecimal value for the number. – Dev Sep 11 '13 at 15:30
  • 2
    @user1768788 - As Dev already mentioned, a single byte can represent 256 values. With the characters 0-9 and A-F you can represent 16 values. With two of the same characters you can represent 16*16 values, that's the same as a single byte can (256). So you can go through each byte of your MD5 hash and represent it with 2 such characters, that's the hex representation, a kind of encoding. – martinstoeckli Sep 11 '13 at 17:39
0

They're alphanumeric because the result is displayed in hexadecimal form. Makes it easier to visualize and compare.

Kayaman
  • 72,141
  • 5
  • 83
  • 121
  • I still don't understand, sure it can be in hexadecimal, but hexadecimal can display more than just alphanumeric characters? If I'm not entirely wrong, ! in hex is 21? – user1768788 Sep 11 '13 at 14:40
  • No, hexadecimal representation is strictly using the characters 0-9 and a-f. Any binary value, including those resulting from a hash, can be displayed in hexadecimal form. – GriffeyDog Sep 11 '13 at 14:44