1
file=$3
#Using $3 as I am using 1 & 2 in the rest of the script[that works]
file_hash=md5sum "$file" | cut -d ' ' -f l
#generates hashes for file

for a in /path/to/source/* #loop for all files in directory
do
    if [ "$file_hash" == $(md5sum "$a" | cut -d ' ' -f l) ]:
    #if the file hash is equal to the hash generated then file is copied to path/to/source
    then cp "file" /path/to/source/*
    else cp "$file" "file.JPG" mv "file.JPG" /path/to/source/$file #otherwise the file renamed as file.JPG so it is not overwritten
    fi 
done 

Can anyone help me with this code? I'm trying to write a script in Bash which will generate hashes for all my files within a directory, if there is two duplicate hashes, then only one of the images is copied to the destination directory, can anyone see where I am going wrong here?

I have to use md5sum, so no other sha1s, fdupes or anything like that unfortunately.

Robin Green
  • 32,079
  • 16
  • 104
  • 187

1 Answers1

2

Assuming it doesn't matter which of the unique files is copied, a simple way would be to use bash's support for associative arrays:

declare -A files

while read hash name
do
    files[$hash]=$name
done < <(md5sum /path/to/source/*)

cp "${files[@]}" /path/to/dest

Any file with an identical hash will simply overwrite the record of the previous one, leaving you with only unique files in the array.

FatalError
  • 52,695
  • 14
  • 99
  • 116
  • Well it is specifically JPGs, but I am sure that won't be a problem. I take it the method I have used is just totally unworthy? I'll try out your code just now. – user3038305 Nov 26 '13 at 22:11
  • @user3038305: Nothing against your approach, I just wasn't able to follow exactly what you were doing. I suppose it's possible that I didn't understand what you were asking, too ;). – FatalError Nov 26 '13 at 22:12
  • Oh okay, I'm open to simpler and more readable approaches anyway. I may be able to explain better, I have JPGs within my directory, I have to copy the ones that start with IMG_*.JPG, there are files in there that have the same names, but have different images, and there are also duplicate images with different names, I am trying to create a script that will sift through my photography directories doing general housekeeping that I can use every now and again. – user3038305 Nov 26 '13 at 22:17
  • I had a few errors, running your code, declare: -A: invalid option declare: usage: declare [-afFirtx [-p] [name[=value]...] line24: syntax error near unexpected token '<', 'done < < md5sum /mnt/sdb1/flashmem/*)' – user3038305 Nov 26 '13 at 22:20
  • @user3038305: What version of `bash`? Also, is your shebang for the script `#!/bin/bash` (and *not* `#!/bin/sh`)? – FatalError Nov 26 '13 at 22:22
  • it is inded #!/bin/sh, using Puppy Linux's URXVT console – user3038305 Nov 26 '13 at 22:27
  • @user3038305: Then you're actually invoking the Bourne shell, not `bash`. On many linux systems, `bash` is `sh`, but it behaves differently when invoked as `sh`. The features I've used are available only in `bash` mode, so you'd need `#!/bin/bash` for it to work. – FatalError Nov 26 '13 at 22:35
  • Right, it has spat out a hash for one of my files, along with a couple of subdirectories in my source folder, it still says -A is an invalid option... and [@] is not a file nor directory – user3038305 Nov 26 '13 at 22:41