#!/bin/sh
LASTBASE=""
find $1 -type f -print | rev | sort | rev | while read FILE
do
BASE=$(basename "$FILE")
if [ "$BASE" = "$LASTBASE" ]; then
rm "$FILE"
LASTBASE="$BASE"
done

- 185
- 1
- 10
-
1You could insert the `uniq` command into your script/process... – ewwhite Feb 06 '13 at 19:07
-
I was going to suggest `uniq -d` and an incantation of `sed` to remove every other line. But not sure how that will fix the white space problem. – Aaron Copley Feb 06 '13 at 19:17
-
1Add `-x` to `#!/bin/sh` to see if it provides any insight? – Aaron Copley Feb 06 '13 at 19:19
-
@ewwhite uniq isn't useful because i need compare basename not full path – stefcud Feb 07 '13 at 02:27
3 Answers
If you pipe the output of find
into a while read
loop you can deal with them line by line:
find nnn/ -type f -print | rev | sort | rev | while read FILE; do
...
done
Edit: So this method does break if filenames contain double (consecutive) spaces, because read
actually splits the line up according to $IFS
and then joins it again when storing the last variable. To address this you could temporarily change $IFS
to disable splitting:
OIFS="$IFS"
IFS=""
find | while read...
IFS="$OIFS"
Edit: test
(which is the same as [
) doesn't have a ==
operator, you just want =
.

- 30,615
- 7
- 79
- 122
-
1I don't think that this will work with spaces in filenames as was in the question. – mdpc Feb 06 '13 at 19:12
-
1
-
-
-
@mgorven look my edit! im using while, now $BASE have right value but im receive "unexpected operator" in each items – stefcud Feb 06 '13 at 20:17
-
-
ok I have corrected the == error. now there are no errors bash, but if the condition comparison is always false!! – stefcud Feb 06 '13 at 21:34
I just found this "gem" in an old bash history and it, well, actually works without stumbling over whitespaces in filenames.
Content-wise Comparison
for hash in `find . -exec md5sum {} \; 2>/dev/null | sort | awk '{ print $1 }' | uniq -d`; do
find . -exec md5sum {} \; 2>/dev/null | grep $hash | awk '{print $2 }';
done;
informal:
- First line: traverse the directory tree and calculate the md5sum of all files below, sort this output (format: hash filename), grab the hash column, reduce it to doubled values. (means there are duplicates)
- Second line: for every one of the double-occuring hashes, traverse again and print the filename if the current file has the current hash (means the file is one of multiple)
example output:
./aFile
./aFolder/aFile
./1000digitsOfPI
./a/b/c/thousanddigitsofPI
./b File
./bFolder/cFolder/b File
Removing is not implemented here because it might be hard to decide which version of the doubled files you want to keep.
Filename-wise Comparison
If you just want to look at filenames and not at contents, it gets even easier:
for name in `find . -type f -printf "%f\n" | sort | uniq -d`; do
find . -name $name;
done;
Update: Unfortunately this version is breaking with whitespaces in filenames again.

- 1,114
- 9
- 18
-
this code is very interesting, but unfortunately I can not run a md5 because the files are very large and server resources is tiny. In my case I am aware that files with the same name also have the same content how can I modify your code to do a background check on name only? – stefcud Feb 07 '13 at 01:17
-
1Oh, in that case it's not a wtf-gem anymore, just an ordinary *find*. This textbox doesn't like the long line, I'll edit it into the answer. – Karma Fusebox Feb 07 '13 at 01:32
-
-
in title i wrote wrote "duplicates filenames" not "duplicates files" anyway thanks for your code is very useful the same – stefcud Feb 07 '13 at 02:05
-
I know, all I wanted was to paste some quirky old code that luckily might work for you, even if it does not match your exact request. ;) I have other bad news though. As I'm playing around with it, I see that the filename-comparison suffers from the whitespaces in names again. ARGH. Sorry, don't think it can be done this way. – Karma Fusebox Feb 07 '13 at 02:20
-
view my last edit, im found complete solution(also for white spaces in names) try suggestions of @mgorven – stefcud Feb 07 '13 at 02:25
The problem lies in this line of code for FILE in $FILES; do
- the for loop is assigning the FILE variable based on the white space separator. So if a file has one or more whitespaces then it won't work. Simply change the default IFS from space to new line or tab. If I remember correctly you can set IFS in bash using something like this -
IFS=$'\n'

- 9,291
- 1
- 33
- 36