0

I have directory with 80 files, I want to check the checksum of the directory.

I want to verify that the content of the directory is match to the latest version.

I working with real time system so It need to be efficient.

I found some file checksum function.

My question is: Can I do checksum to directory without looping all the files?

bolov
  • 72,283
  • 15
  • 145
  • 224
LIOR
  • 149
  • 1
  • 12
  • 2
    "My question is Can I do chacksum to directory without looping all the files?" it depends on what you mean by "checksum of the directory" – Slava Dec 20 '17 at 17:51
  • I want to read the directory in run time, and verify that the content is valid. – LIOR Dec 20 '17 at 17:55
  • 1
    Do you need to validate content of each file? If yes then you will have to calculate checksum for each file. – Slava Dec 20 '17 at 17:58
  • In general i need to validate every file, this is why I trying to understand if i can do it in directory level. – LIOR Dec 20 '17 at 18:16
  • 1
    ***Can I do checksum to directory without looping all the files?*** No. You have to read each file. Calculate a checksum then store the checksum in some file and compare for subsequent executions. – drescherjm Dec 20 '17 at 18:19
  • What kind of system are you on? Linux? Windows? macOS? – Mark Adler Dec 22 '17 at 13:37
  • I working on Linux – LIOR Dec 22 '17 at 13:39

2 Answers2

2

One possible solution is to maintain a special (hidden) file in which you keep modification datetime and hash of each file and update it when files are changed. Then have hash of that file as hash of the directory. This will not prevent from calculating checksums for all files but reduce amount of work when files are changed. If you need to make a snapshot of a dir, then you do not have a choice but need to checksum every file in particular order.

Note: you may consider to avoid crypto hashes unless you have strong requirement that they cannot be faked. Crypto hashes are rather slow and inefficient and that is on purpose.

Slava
  • 43,454
  • 1
  • 47
  • 90
0

I'm not certain of your application. md5 checksum is generally used in situations where accuracy is favored over speed (or to present an additional barrier to foul play.)

In build systems where speed is king, file dating is generally used to evaluate directory equality. For example given the latest directory is: filesystem::path foo and the directory in question is: filesystem::path bar you could simply evaluate:

equal(filesystem::directory_iterator(foo), filesystem::directory_iterator(), filesystem::directory_iterator(bar), filesystem::directory_iterator(), [](const auto& lhs, const auto& rhs) { return filesystem::last_write_time(lhs) == filesystem::last_write_time(rhs); })

Live Example

A few notes about this one-liner:

  1. This is not a recursive function, it's evaluating only the dates immediately within foo and bar
  2. If you are working with directories, Visual Studio's implementation of last_write_time seems to have an issue with directories
  3. In a build system for example this has the tremendous advantage over an md5 checksum that I can change the == in the lambda to < the one liner will still respond affirmatively if you have manually updated one of the files to a newer test version
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288