I'm currently having 2 (top level) directories with a lot of sub directories and files. There are a lot of duplicate files between those directories but there is one problem with it, the names of the files and/or directories can be different. I'm trying to create to way to gain knowledge about files that doesn't exists on one of both sides. Normally a tool of kdiff3/fslint/etc. would find for duplicaties. But in this case i also want to see which files doesn't occur on on of both sides. Now i'm creating 1 database with 2 tables, the filenames (including full path) including the MD5 hash of that file name. Based on this i can create some queries to show which files occur on both sides or with files doesn't. But this is a currently very time consuming excercise (i'm talking about 100.000+ files with size that range from 500KB to 1GB).
Does anyone has any tips or tools that i can use for this 'problem'?