2

I am trying to generate md5 hash from Powershell. I installed Powershell Community Extension (Pscx) to get command : Get-Hash

However when I generate md5 hash using Get-Hash, it doesn't seem to match the hash generated using md5sum on an Ubuntu machine.

Powershell:

PS U:\> "hello world" | get-hash -Algorithm MD5

Path Algorithm HashString                       Hash
---- --------- ----------                       ----
     MD5       E42B054623B3799CB71F0883900F2764 {228, 43, 5, 70...}

Ubuntu:

root@LT-A03433:~# echo "hello world" | md5sum
6f5902ac237024bdd0c176cb93063dc4  -

I know that the one generated by Ubuntu is correct as a couple of online sites show the same result.

What am I going wrong with Powershell Get-Hash?

picciano
  • 22,341
  • 9
  • 69
  • 82
Jigar
  • 23
  • 5

3 Answers3

3

The difference is not obvious, but you are not hashing the same data. MD5 is a hashing algorithm, and it has no notion of text encoding – this is why you can create a hash of binary data just as easily as a hash of text. With that in mind, we can find out what bytes (or octets; strictly a stream of values of 8 bits each) MD5 is calculating the hash of. For this, we can use xxd, or any other hexeditor.

First, your Ubuntu example:

$ echo "hello world" | xxd
0000000: 6865 6c6c 6f20 776f 726c 640a            hello world.

Note the 0a, Unix-style newline at the end, displayed as . in the right view. echo by default appends a newline to what it prints, you could use printf, but this would lead to a different hash.

$ echo "hello world" | md5
6f5902ac237024bdd0c176cb93063dc4

Now let's consider what PowerShell is doing. It is passing a string of its own directly to the get-hash cmdlet. As it turns out, the natural representation of string data in a lot of Windows is not the same as for Unix – Windows uses wide strings, where each character is represented (in memory) as two bytes. More specifically, we can open a text editor, paste in:

hello world

With no trailing newline, and save it as UTF-16, little-endian. If we examine the actual bytes this produces, we see the difference:

$ xxd < test.txt
0000000: 6800 6500 6c00 6c00 6f00 2000 7700 6f00  h.e.l.l.o. .w.o.
0000010: 7200 6c00 6400                           r.l.d.

Each character now takes two bytes, with the second byte being 00 – this is normal (and is the reason why UTF-8 is used across the Internet instead of UTF-16, for example), since the Unicode codepoints for basic ASCII characters are the same as their ASCII representation. Now let's see the hash:

$ md5 < thefile.txt
e42b054623b3799cb71f0883900f2764

Which matches what PS is producing for you.

So, to answer your question – you're not doing anything wrong. You just need to encode your string the same way to get the same hash. Unfortunately I don't have access to PS, but this should be a step in the right direction: UTF8Encoding class.

Aurel Bílý
  • 7,068
  • 1
  • 21
  • 34
  • After your explanation and with help from a colleague, using Get-Hash with "-StringEncoding ascii" getting me the correct result. `PS U:\> echo "hello world" | Get-Hash -Algorithm md5 -StringEncoding ascii Path Algorithm HashString Hash ---- --------- ---------- ---- MD5 5EB63BBBE01EEED093CB22BB8F5ACDC3 {94, 182, 59, 187...}` – Jigar Apr 24 '18 at 01:06
0

This question is surely related to How to get an MD5 checksum in PowerShell, but it’s different and makes an important point.

Md5sums are computed from bytes. In fact, your Ubuntu result is, in a sense, wrong:

$ echo "hello world" | md5sum
6f5902ac237024bdd0c176cb93063dc4  -

$ echo -n "hello world" | md5sum
5eb63bbbe01eeed093cb22bb8f5acdc3  -

In the first case you sum the 12 bytes which make up the ASCII representation of your string, plus a final carriage return. In the second case, you don’t include the carriage return.

(As an aside, it is interesting to note that a here string includes a carriage return:)

$ md5sum <<<"hello world"
6f5902ac237024bdd0c176cb93063dc4 

In Windows powershell, your string is represented in UTF-16LE, 2 bytes per character. To get the same result in Ubuntu and in Windows, you have to use a recoding program. A good choice for Ubuntu is iconv:

$ echo -n "hello world" | iconv -f UTF-8 -t UTF-16LE | md5sum
e42b054623b3799cb71f0883900f2764  -
Dario
  • 2,673
  • 20
  • 24
0

md5sum is wrong-ish, in spite of other people agreeing with it. It is adding a platform-specific end-of-line characters to the input string, on unix an lf, on windows a cr-lf.

Verify this on a machine with powershell and bash and e.g. postgres installed for comparison:

'A string with no CR or LF at the end' | %{  psql -c "select md5('$_' || Chr(13) || Chr(10) )"   }
echo 'A string with no CR or LF at the end' | md5sum.exe
'A string with no CR or LF at the end' | %{  psql -c "select md5('$_' || Chr(10) )"   }
bash -c "echo 'A string with no CR or LF at the end' | md5sum.exe"

Output first two lines:

PS> 'A string with no CR or LF at the end' | %{  psql -c "select md5('$_' || Chr(13) || Chr(10) )"   }
               md5
----------------------------------
 1b16276b75aba6ebb88512b957d2a198

PS> echo 'A string with no CR or LF at the end' | md5sum.exe

1b16276b75aba6ebb88512b957d2a198 *-

Output second two lines:

PS> 'A string with no CR or LF at the end' | %{  psql -c "select md5('$_' || Chr(10) )"   }
               md5
----------------------------------
 68a1fcb16b4cc10bce98c5f48df427d4

PS> bash -c "echo 'A string with no CR or LF at the end' | md5sum.exe"

68a1fcb16b4cc10bce98c5f48df427d4 *-
Chris F Carroll
  • 11,146
  • 3
  • 53
  • 61