-2

Is it possible to get single md5 string for files that are listed by find?

This code is produce some md5 hash string but seems like it just use names of files, but I need to use content also:

find my_dir -name "*.jpg" | md5

Test on json files:

tree temp_dir

temp_dir
├── temp_1
│   ├── 071-FBA-227597_custom_faceboxes_face_bbox.json
│   └── 083-FBA-228758_custom_faceboxes_face_bbox.json
└── temp_2
    ├── 071-FBA-227597_custom_faceboxes_face_bbox.json
    └── 083-FBA-228758_custom_faceboxes_face_bbox.json

One json in temp2 is modified:

md5 temp_dir/temp_1/071-FBA-227597_custom_faceboxes_face_bbox.json
MD5 (temp_dir/temp_1/071-FBA-227597_custom_faceboxes_face_bbox.json) = 8da7666a1cf7f68b102a2ebb2ce01eae

md5 temp_dir/temp_1/083-FBA-228758_custom_faceboxes_face_bbox.json
MD5 (temp_dir/temp_1/083-FBA-228758_custom_faceboxes_face_bbox.json) = 93afe3b2b627948ff870496bf8302b85

md5 temp_dir/temp_2/071-FBA-227597_custom_faceboxes_face_bbox.json
MD5 (temp_dir/temp_2/071-FBA-227597_custom_faceboxes_face_bbox.json) = 8da7666a1cf7f68b102a2ebb2ce01eae

md5 temp_dir/temp_2/083-FBA-228758_custom_faceboxes_face_bbox.json
MD5 (temp_dir/temp_2/083-FBA-228758_custom_faceboxes_face_bbox.json) = 6308ef748f5c9a895d36bc8a71b37112

For some reason md5 on filepath list is different, is this expected?:

find temp_1 -name "*.json"
temp_1/071-FBA-227597_custom_faceboxes_face_bbox.json
temp_1/083-FBA-228758_custom_faceboxes_face_bbox.json

find temp_2 -name "*.json"
temp_2/071-FBA-227597_custom_faceboxes_face_bbox.json
temp_2/083-FBA-228758_custom_faceboxes_face_bbox.json

find temp_1 -name "*.json" | md5
ed0b14613ce97542a4e5531ff196378f

find temp_2 -name "*.json" | md5
50d0ded6eb3bf396a0b1c091c9067fdc

Also I have tried just copy temp_1 and created temp_3, but it also gives different hash, is this expected?:

find temp_3 -name "*.json"
temp_3/071-FBA-227597_custom_faceboxes_face_bbox.json
temp_3/083-FBA-228758_custom_faceboxes_face_bbox.json

find temp_3 -name "*.json" | md5
f62473085a4b32b287ead4f8f9e67e15

md5 temp_3/071-FBA-227597_custom_faceboxes_face_bbox.json
MD5 (temp_3/071-FBA-227597_custom_faceboxes_face_bbox.json) = 8da7666a1cf7f68b102a2ebb2ce01eae

md5 temp_3/083-FBA-228758_custom_faceboxes_face_bbox.json
MD5 (temp_3/083-FBA-228758_custom_faceboxes_face_bbox.json) = 93afe3b2b627948ff870496bf8302b85

Method with cat produce valid results:

find temp_1 -name "*.json" -exec cat {} \; | md5
b2abfe623e93153598d6625930f934f2

find temp_2 -name "*.json" -exec cat {} \; | md5
c64eb7a0a8749b11aa11a0312d37f81f

find temp_3 -name "*.json" -exec cat {} \; | md5
b2abfe623e93153598d6625930f934f2
mrgloom
  • 20,061
  • 36
  • 171
  • 301

2 Answers2

3
cat $(find my_dir -name "*.jpg") | md5

In case there's space in the filename

find my_dir -name "*.jpg" -exec cat {} \; | md
  • This will produce md5 hash for each file, but I need single md5 hash for all files. – mrgloom May 14 '19 at 07:14
  • Produce same results as `find my_dir -name "*.jpg" -exec cat {} \; | md5` by @Mark Setchell – mrgloom May 14 '19 at 09:43
  • @mrgloom what exactly do you want? – Cloud Ace Wenyuan Jiang May 14 '19 at 10:59
  • Your code works fine, but I wonder why `find temp_1 -name "*.json" | md5` and `find temp_3 -name "*.json" | md5` (see my update) produce inconsistent results? – mrgloom May 14 '19 at 11:14
  • 1
    because find temp_1 -name "*.json" and find temp_3 -name "*.json" produce different output – Cloud Ace Wenyuan Jiang May 14 '19 at 12:13
  • Yes, I missed the name of folder that included in path. – mrgloom May 14 '19 at 12:51
  • 1
    @mrgloom The output from this is only the same as my comment when none of your files have spaces in the name. If any do have spaces, this answer will not work. – Mark Setchell May 14 '19 at 13:37
  • I wonder if this solution will produce same result across platforms, i.e. on MacOS `cat $(find images -name "*.jpg") | md5` produce `bb54374d091a1d20c7eadf3d6e89bd1e` and then I copy `images` folder on Linux server and `cat $(find images -name "*.jpg") | md5sum` produce `c2ad04c07578100c48da8a4aba420bbc -`, but for single file md5 works fine `MD5 (images/001-FBA-223652.jpg) = 0e4ff040dc0a856876ecad6a20493c76` and `0e4ff040dc0a856876ecad6a20493c76 images/001-FBA-223652.jpg` – mrgloom May 14 '19 at 15:51
1

What you can do is an md5sum of a md5sum:

including filename/dirname/md5sum:

find . -type f -iname '*.jpg' -exec md5sum {} \; | md5sum

This returns a single md5sum from a list looking like:

d9a881340010ad5df0b5cd99aadb327f   ./path/to/file1.jpg
8b3b2a7b974af9eea72da94c1ca02b8a   ./path/file2.jpg

including filename/md5sum:

find . -type f -iname '*.jpg' -exec md5sum {} \; | awk -F'/' '{print substr($0,1,32),$NF}' | md5sum

This returns a single md5sum from a list looking like:

d9a881340010ad5df0b5cd99aadb327f file1.jpg
8b3b2a7b974af9eea72da94c1ca02b8a file2.jpg

including md5sum:

find . -type f -iname '*.jpg' -exec md5sum {} \; | cut -d" " -f1 | md5sum

This returns a single md5sum from a list looking like:

d9a881340010ad5df0b5cd99aadb327f
8b3b2a7b974af9eea72da94c1ca02b8a
kvantour
  • 25,269
  • 4
  • 47
  • 72
  • I mean single md5 hash for list of filepaths, see update for clarification. – mrgloom May 14 '19 at 09:43
  • @mrgloom Your original question states: _This code is produce some md5 hash string but seems like it just use names of files, **but I need to use content also**._ Any example of the above will produce a single md5sum for the selected files, inclusive its content. I gave 3 options (inclusive full path, inclusive filename only, no filename). The presented md5 outputs is the example to show what is parsed by md5sum **again** to produce the final result. – kvantour May 14 '19 at 09:53