3

I'm trying to generate an MD5-Checksum with powershell for a whole directory. On Linux there is a 1-liner that works just great, like this one:

$ tar -cf - somedir | md5sum

I learned that "tar" is now part of Windows10 and that it can be adressed in the PowerShell. So I tried this:

tar -cf - C:\data | Get-FileHash -Algorithm MD5

What I get from PowerShell is this:


tar.exe: Removing leading drive letter from member names
Get-FileHash : the input object cannot be bound to any parameters of the command because the command does not accept pipeline input or the input and its properties do not match any of the parameters that accept pipeline Input


My Shell is set to german, so I ran the german error text through a Translation machine (https://www.translator.eu/#).

I wondered why I get this particular error message, because Get-FileHash IS able to process pipelined Input, e.g.:

ls | Get-FileHash -Algorithm MD5

This command just works like a charm, but it gives me checksums for each and every file. What I want is 1 checksum for a complete given directory.

So, I probably messed up something… - any ideas?

LightningJack
  • 47
  • 1
  • 1
  • 6
  • PowerShell is object based, running random command line programs like tar.exe doesn't return objects only strings, piping from those generally won't give you the results you want. The reason piping from ls works is that it's just an alias for a PowerShell cmdlet, Get-Childitem. What is the use case, checking if something has changed in a folder? – PMental Oct 21 '20 at 17:14
  • 1
    `Get-FileHash` accepts file info references via the pipeline, not raw input data. `tar -cf "$env:temp\data.tar" C:\data; Get-FileHash "$env:temp\data.tar" -Algorithm MD5; Remove-Item "$env:temp\data.tar" -Force` should do – Mathias R. Jessen Oct 21 '20 at 17:19
  • @MathiasR.Jessen; Your Suggestion works like a charm, and it is VERY fast! Thanks a lot. How can I mark your comment as great solution? :-) – LightningJack Oct 22 '20 at 09:28

1 Answers1

7

EDIT: Here's an alternate method that is consistent even if all the files are moved/copied to another location. This one uses the hashes of all files to create a "master hash". It takes longer to run for obvious reasons but will be more reliable.

$HashString = (Get-ChildItem C:\Temp -Recurse | Get-FileHash -Algorithm MD5).Hash | Out-String
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]$HashString))

Original, faster but less robust, method:

$HashString = Get-ChildItem C:\script\test\TestFolders -Recurse | Out-String
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]$HashString))

could be condensed into one line if wanted, although it starts getting harder to read:

Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]"$(Get-ChildItem C:\script\test\TestFolders -Recurse|Out-String)"))

Basically it creates a memory stream with the information from Get-ChildItem and passes that to Get-FileHash.

Not sure if this is a great way of doing it, but it's one way :-)

PMental
  • 1,091
  • 6
  • 12
  • This 1-liner works - and it would be great if it would be enough to just get a checksum of a Directory listing, but this Hash would most likely be the same if in 1 file of the Directory 1 character would be exchanged with another, right? – LightningJack Oct 22 '20 at 09:13
  • OK - just checked again, it DOES "see" an exchanged character… - because of the exchange of 1 character (or any characters for that matter) the changed file gets a new timestamp and therefore another hash … So far, so good ... - but would this hash-method also "see" if the Directory would be copied to another Location while 1 file in the Directory would have been copied with an error in it (Copy-Error)? – LightningJack Oct 22 '20 at 09:41
  • @LightningJack ... but that also means you can't trust it - if the content of two sets of files are identical it will produce different hashes based on the different dates – Mathias R. Jessen Oct 22 '20 at 09:42
  • @MathiasR.Jessen... you are right, in that case this method would be flawed. But if someone wanted to make sure that absolutely NO CHANGE has been made to any file in the whole Directory, this method seems viable to me - so I will mark this answer as a possible solution to a certain scenario also. – LightningJack Oct 22 '20 at 09:59
  • @LightningJack It's quite easy to use a similar method but instead of using Get-ChildItem we use the hashes of all files as our base. It'll take slightly longer, but if performance isn't critical it's quite easy. I'll update my answer with that option later today. – PMental Oct 22 '20 at 10:40
  • @LightningJack Ok I've edited my answer with an updated method. Should be much more robust and consistent if the files are copied somewhere else, but also slower since each file is individually hashed. – PMental Oct 22 '20 at 11:04