0

I'm currently having 2 (top level) directories with a lot of sub directories and files. There are a lot of duplicate files between those directories but there is one problem with it, the names of the files and/or directories can be different. I'm trying to create to way to gain knowledge about files that doesn't exists on one of both sides. Normally a tool of kdiff3/fslint/etc. would find for duplicaties. But in this case i also want to see which files doesn't occur on on of both sides. Now i'm creating 1 database with 2 tables, the filenames (including full path) including the MD5 hash of that file name. Based on this i can create some queries to show which files occur on both sides or with files doesn't. But this is a currently very time consuming excercise (i'm talking about 100.000+ files with size that range from 500KB to 1GB).

Does anyone has any tips or tools that i can use for this 'problem'?

grezly
  • 1

1 Answers1

0

If I have such problem, I'll try to make my solution simple and use console tools. I'll create file for each directory, that contains path to files and it md5 hashes, then use grep and awk to find, trough comparing hashes, duplicate files and non-duplicate files.

Alexander Tolkachev
  • 4,608
  • 3
  • 14
  • 23