same hash, different behavior

Question

I have two files that give the same hash, and even the same hexdump. File A and File B start on Linux Box 1 and Linux Box 2, respectively. I then copy both files to a Windows share, and read them from a Windows machine. The files still seem to be byte-by-byte identical with the Windows utility Fc (with /b option -- binary mode). However, when I open the two different files, they appear to have different encoding (newlines/line-wrap). Why wasn't this uncovered by the hashes/hexdump/Fc?

What am I overlooking here?

How do you determine that they have different encodings and/or newlines? — deceze, Apr 30 '12 at 22:39

score 0 · Answer 1 · answered Aug 25 '12 at 15:43

Don't use wordpad for that. Actually, don't use wordpad at all. Note that Microsoft often does not keep to standards, and in many times (e.g. the browser) simply takes and informed guess at file or stream content, using the header as some kind of magic. Sometime it guesses wrong, some times it doesn't.

You could calculate the hash on the Windows machine as well, there are plenty of lightweight utilities that calculate secure hashes within windows Explorer. You could also install command line utilities such as OpenSSL on Windows (or take it a step further and install Cygwin, which I always have running on my Windows machine).

Windows has never had a real strategy regarding line endings, except keeping to it's own double-character standard. In later versions of Windows you may use Notepad which does (finally) understand Unix newlines if you must (because maybe it screws up UTF-16 this time around).

same hash, different behavior

1 Answers1