9

I am searching a moderate number (~500) of folders for a large number (~200,000) of files from a .NET application.

I hoped to use DirectoryInfo.GetFiles, passing in SearchOption.AllDirectories. However this approach seems to be a lot slower than writing my own code to iterate through the directories and do GetFiles just passing in a searchPattern.

Related MSDN info:

  • GetFiles(String)
    Returns a file list from the current directory matching the given searchPattern.
  • GetFiles(String, SearchOption)
    Returns a file list from the current directory matching the given searchPattern and using a value to determine whether to search subdirectories.

Has anyone had a similar experience to this?

Richard Ev
  • 52,939
  • 59
  • 191
  • 278
  • The time it takes for `GetFiles()` to return, can be estimated with the File Explorer context menu for Properties. On a folder with lots of subfolders it may take "forever" to show the total number of files/folders/bytes. – Roland Nov 15 '21 at 13:12

2 Answers2

14

These two functions are actually infamous for their performance. The reason is that GetFiles walks entire directory tree and constructs an array of FileInfo objects, and only then returns the result to the caller. Construction of said array involves a lot of memory allocations (I'm sure they use List internally, but still) since the number of entries cannot be known ahead of time.

If you're really into performance, you can P/Invoke into FindFirstFile/FindNextFile/FindClose, abstract them into an IEnumerable<FileInfo> and yield FileInfos one at a time.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
Anton Gogolev
  • 113,561
  • 39
  • 200
  • 288
3

The approach Anton mentioned using FindFirstFile() and related native methods has been implemented as of .NET 4 via DirectoryInfo.EnumerateFiles() so no more need for P/Invoke for this!

Mike Marynowski
  • 3,156
  • 22
  • 32