0

I've a huge list of file paths to a remote share that I need to check for existence. example input:

\server\folder1\file1
\server\folder1\file2
\server\folder2\file3
etc.

We currently simply do File.Exists (which basically uses the FindFirstFile WinAPI) per file but it's pretty slow. Is there any more efficient way of doing it? is there a way to parallelize is somehow?

John Saunders
  • 160,644
  • 26
  • 247
  • 397
sternr
  • 6,216
  • 9
  • 39
  • 63
  • 1
    Add all files to an array or list. Then iterate the array and check for all files that need to be checked this way all files are only checked once. – deathismyfriend May 27 '15 at 00:11
  • 1
    Not an answer, but depending on the number of files and how they are distributed, it looks like a tree organizational patterns might help? For example, a single check to see if directory `\server\folder1` exists, and if it doesn't, you knocked off two files from the list right there. Though this could bite you if there are very few files in each directory or if directory structure is _usually_ guaranteed to exist like that. You would need to test this theory against your particular case. – bob-the-destroyer May 27 '15 at 03:10
  • Is there no faster win api to query for file existence? – sternr May 27 '15 at 05:31
  • Have you checked how long it takes to read all the file names into memory and check for existence in RAM? Maybe one big read is faster than a bunch of little ones? (Although I would think caching would help here.) – Enigmativity May 27 '15 at 05:44
  • The files are spread across a large amount of folders, reading the content of all the folders is ALOT more expensive – sternr May 27 '15 at 05:49

1 Answers1

1

Using I/O operations is usually slow and expensive.
I suggest using a different approach all together:

Get the list of files once, when the program is initialized, and store it in a database.

Use a FileSystemWatcher to monitor creation of files within the path (if you have different parent directories or different remote computers you might need to use a FileSystemWatcher for each one), and then update the database on each created, deleted and renamed event for the relevant files.

Then all you have to do to get the files that exists is a simple sql query, that will be lightning fast comparing to iterating on a large list and test for File.Exist

Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
  • Thanks Zohar, but migrating to SQL or putting a FSW is just not possible on our environment due to both large scale as well as historical reasons. so assuming the problem at hand - is there nofaster way? – sternr May 28 '15 at 12:52
  • I'm guessing you could use different threads for different folders, but other then that I can't think of anything else at the moment. – Zohar Peled May 28 '15 at 13:18