Find and group all unique versions of a file

Question

I have a convoluted directory structure where there are many copies of foo.txt. I'd like to find all copies of this file and group them by their diff result, i.e.

[type 1]
/bar2/sub1/foo.txt
/foobar/foo.txt

[type 2]
/bar2/sub1/foo.txt
/foobar/foo.txt

[type 3]
/sub3/sub4/sub5/foo.txt
...

score 2 · Accepted Answer · answered Jan 09 '13 at 17:40

2

How about something simple like calculating the md5sum for each file and sorting based on the hash.

md5deep -r . | sort

d921223ccbe759a632973962bc15a497  /root/.bash_history
dcac40478a92e87cd08a42a6425acea6  /root/testsrv/keys/04.pem
dcac40478a92e87cd08a42a6425acea6  /root/testsrv/keys/client2.crt
e12f5739f81b08c470f20890304bf53e  /root/.bashrc
e1b23db3d2293b142938c74649d9fa6a  /root/testsrv/list-crl
e4e2818e1ed11a951ed5da4e1a86885a  /root/testsrv/keys/revoke-test.pem
ee8bd2ea88220c877a62e22e36a02d20  /root/testsrv/keys/index.txt.attr
ee8bd2ea88220c877a62e22e36a02d20  /root/testsrv/keys/index.txt.attr.old

answered Jan 09 '13 at 17:40

Zoredache

130,897
41
276
420

Good idea! Any difference between `md5sum` and `md5deep`? Also is there a trivial way to add a newline after the first column differs? (I feel like this may be a job for `awk` but I like the expertise). – Hooked Jan 09 '13 at 18:40
Last time I looked `md5sum` doesn't have any method to recursively process a directory. You could probably use find and xargs to overcome this. But md5deep is the method I generally use. – Zoredache Jan 09 '13 at 18:44

Find and group all unique versions of a file

1 Answers1