1

I have a convoluted directory structure where there are many copies of foo.txt. I'd like to find all copies of this file and group them by their diff result, i.e.

[type 1]
/bar2/sub1/foo.txt
/foobar/foo.txt

[type 2]
/bar2/sub1/foo.txt
/foobar/foo.txt

[type 3]
/sub3/sub4/sub5/foo.txt
...
Hooked
  • 113
  • 4

1 Answers1

2

How about something simple like calculating the md5sum for each file and sorting based on the hash.

md5deep -r . | sort

d921223ccbe759a632973962bc15a497  /root/.bash_history
dcac40478a92e87cd08a42a6425acea6  /root/testsrv/keys/04.pem
dcac40478a92e87cd08a42a6425acea6  /root/testsrv/keys/client2.crt
e12f5739f81b08c470f20890304bf53e  /root/.bashrc
e1b23db3d2293b142938c74649d9fa6a  /root/testsrv/list-crl
e4e2818e1ed11a951ed5da4e1a86885a  /root/testsrv/keys/revoke-test.pem
ee8bd2ea88220c877a62e22e36a02d20  /root/testsrv/keys/index.txt.attr
ee8bd2ea88220c877a62e22e36a02d20  /root/testsrv/keys/index.txt.attr.old
Zoredache
  • 130,897
  • 41
  • 276
  • 420
  • Good idea! Any difference between `md5sum` and `md5deep`? Also is there a trivial way to add a newline after the first column differs? (I feel like this may be a job for `awk` but I like the expertise). – Hooked Jan 09 '13 at 18:40
  • Last time I looked `md5sum` doesn't have any method to recursively process a directory. You could probably use find and xargs to overcome this. But md5deep is the method I generally use. – Zoredache Jan 09 '13 at 18:44