3

I need your help guys! I'll try to be as specific as possible.

Scenario: I have a MOUNTED image on a Linux Distro. I copied all the files in this mounted Distro to a folder I have on my Linux System called "raw" (in a sub directory of Files...so Files/Raw). I created a HASH LIST (MD5 using md5sum) in a text file of all the files in this Raw folder. I deduped (got rid of redundant hashes) this HASH LIST into a new text file, called "UniqueHashes.txt"

Task: Essentially what I need to do now is to go through the entire Raw folder and copy EACH file that has a matching MD5 hash, to one of the hashes in the UniqueHashes.txt.

What I was thinking of doing was: Looping through RAW using find . -type f, and then hashing each file, and comparing that hash to every line in the unique hash list that I created. If it exists in the unique hash list, then copy that file (preserving time stamp) into DD, else, ignore that file.

It needs to be in BASH. You're help is greatly appreciated. I don't expect you to hand me the answer in code, but if you do it, then that would be awesome. However, any guidance you can give me to approach this problem would be amazing!!!

Thanks in advance!

jww
  • 97,681
  • 90
  • 411
  • 885

2 Answers2

3

Use fdupes, a nifty third party tool available from your package manager:

fdupes -d -r files/raw

will prompt you for which of the duplicate files you want to keep, for each set of identical files.

Other options include

fdupes -d -r -N files/raw 

to automatically keep a random one, or

fdupes -L -r files/raw

to hard link duplicates, making the directory appear the same, just using less space.

that other guy
  • 116,971
  • 11
  • 170
  • 194
  • 1
    According to the man page, `fdupes -d -r -N files/raw` should accomplish the same as the second example with the yes command piped in. – pendor Nov 05 '13 at 14:54
0
(
IFS=$(echo -en "\n\b")
for file in $(find -printf '%P\n'); do
  if [ -f $file ]; then
    md5=$(md5sum $file | cut -d' ' -f1)
    if grep $md5 Unique # && test ! -f $dest
    then
      # copy source dest 
    fi
  else
    # create a directory at the dest?
  fi  
done
)
perreal
  • 94,503
  • 21
  • 155
  • 181
  • Thanks so much for your help! I'll test it out and let you know the result. Btw, what's the # && test ! -f $dest all about? Thanks again! – user2175914 Mar 17 '13 at 19:14
  • I commented that out but it checks if one of the dupes is already copied to the destination. You need a similar test but not using the filename I think. – perreal Mar 17 '13 at 21:45
  • I keep getting this: md5sum: AHCache: is a directory and then it halts there. – user2175914 Mar 17 '13 at 23:27
  • You're amazing! That update works like a charm, the only thing is, it's not recursive. I can't walk through directories. It only returns the files in directory I'm sitting in, it doesn't go into any sub directories and check those files. – user2175914 Mar 17 '13 at 23:37