We have one rather large table containing documents info together with filepaths pointing to files on file system. After couple of years we noticed that we have files on the disk which are not referenced in DB table and vice-versa.
Since currently I'm learning Clojure I tought it would be nice to make small utility which can find diff between db and file system. Naturally, since i'm beginner I got stucked because there's more than 600 000 documents and obviously I need some more performant and less memory consuming solution :)
My first idea was to generate flatten filesystem tree list with all files, and compare it with list from db, if file doesn't exist put in separate list "non-existing" and if some file exists on HDD and not in DB, move it to some dump directory.
Any ideas?