2

How to find potentially duplicate files in a given directory? Is there some existing tools for this sort of thing? Some heuristics I can think of:

  • it should be recursive.
  • it should only compare file names and size and nothing else
  • it should be able to find duplicates when the file names are trivially different such as "foobar.txt" and "foobar.txt (2)"
  • I have the files on a drive that can be mounted to Linux, Mac OS X, or Windows, as desired
qazwsx
  • 25,536
  • 30
  • 72
  • 106

2 Answers2

2

Under Linux/UNIX, you can use the "sum" or "md5cum" command to generate a checksum for each file. Then just look for files with the same checksum.

The superquick way of doing this would be to sort the output of the sum command with the "sort" command and look for consecutive entries with the same checksum.

The superquick way to get a list of the duplicate files would be sort the output of the sum command to a file, then do it again with the "-u" parameter to sort, making a unique list, and diff the files. The difference will be the duplicate files.

dj_segfault
  • 11,957
  • 4
  • 29
  • 37
1

If you're looking for a duplicate finder on a Mac, check out Gemini. It allows to drag-and-drop particular folders for scanning and immediately preview found duplicates so that you could understand what instances of the file should be left untouched.

However, I don't know whether it meets all your requirements, but anyway, you can find Gemini in the Mac App Store if interested.

Ksumelie
  • 11
  • 1