3
#!/bin/sh
LASTBASE=""  
find $1 -type f -print | rev | sort | rev | while read FILE
do
    BASE=$(basename "$FILE")
    if [ "$BASE" = "$LASTBASE" ]; then
        rm "$FILE"
    LASTBASE="$BASE"
done
stefcud
  • 185
  • 1
  • 10

3 Answers3

3

If you pipe the output of find into a while read loop you can deal with them line by line:

find nnn/ -type f -print | rev | sort | rev | while read FILE; do
    ...
done

Edit: So this method does break if filenames contain double (consecutive) spaces, because read actually splits the line up according to $IFS and then joins it again when storing the last variable. To address this you could temporarily change $IFS to disable splitting:

OIFS="$IFS"
IFS=""
find | while read...
IFS="$OIFS"

Edit: test (which is the same as [) doesn't have a == operator, you just want =.

mgorven
  • 30,615
  • 7
  • 79
  • 122
2

I just found this "gem" in an old bash history and it, well, actually works without stumbling over whitespaces in filenames.

Content-wise Comparison

for hash in `find . -exec md5sum {} \; 2>/dev/null | sort | awk '{ print $1 }' | uniq -d`; do 
     find . -exec md5sum {} \; 2>/dev/null | grep $hash | awk '{print $2 }'; 
done;

informal:

  • First line: traverse the directory tree and calculate the md5sum of all files below, sort this output (format: hash filename), grab the hash column, reduce it to doubled values. (means there are duplicates)
  • Second line: for every one of the double-occuring hashes, traverse again and print the filename if the current file has the current hash (means the file is one of multiple)

example output:

./aFile
./aFolder/aFile
./1000digitsOfPI
./a/b/c/thousanddigitsofPI
./b File
./bFolder/cFolder/b File

Removing is not implemented here because it might be hard to decide which version of the doubled files you want to keep.


Filename-wise Comparison

If you just want to look at filenames and not at contents, it gets even easier:

for name in `find . -type f -printf "%f\n" | sort | uniq -d`; do 
    find . -name $name; 
done;

Update: Unfortunately this version is breaking with whitespaces in filenames again.

Karma Fusebox
  • 1,114
  • 9
  • 18
  • this code is very interesting, but unfortunately I can not run a md5 because the files are very large and server resources is tiny. In my case I am aware that files with the same name also have the same content how can I modify your code to do a background check on name only? – stefcud Feb 07 '13 at 01:17
  • 1
    Oh, in that case it's not a wtf-gem anymore, just an ordinary *find*. This textbox doesn't like the long line, I'll edit it into the answer. – Karma Fusebox Feb 07 '13 at 01:32
  • There you go... – Karma Fusebox Feb 07 '13 at 01:40
  • in title i wrote wrote "duplicates filenames" not "duplicates files" anyway thanks for your code is very useful the same – stefcud Feb 07 '13 at 02:05
  • I know, all I wanted was to paste some quirky old code that luckily might work for you, even if it does not match your exact request. ;) I have other bad news though. As I'm playing around with it, I see that the filename-comparison suffers from the whitespaces in names again. ARGH. Sorry, don't think it can be done this way. – Karma Fusebox Feb 07 '13 at 02:20
  • view my last edit, im found complete solution(also for white spaces in names) try suggestions of @mgorven – stefcud Feb 07 '13 at 02:25
1

The problem lies in this line of code for FILE in $FILES; do - the for loop is assigning the FILE variable based on the white space separator. So if a file has one or more whitespaces then it won't work. Simply change the default IFS from space to new line or tab. If I remember correctly you can set IFS in bash using something like this -

IFS=$'\n'

Daniel t.
  • 9,291
  • 1
  • 33
  • 36