0

I want to be able to enumerate files with a specific search pattern (e.g., *.txt) recursively from a directory. But with couple of constraints:

  1. The mechanism should be very efficient. The goal is to enumerate file one by one (using IEnumerable), so that if there is a huge list of files, then it shouldn't take forever to get one file for processing.
  2. The enumeration should return files randomly, so that if two instances of my program are trying to enumerate the directory, both should not be seeing the files in the same sequence.

Given the requirements, DirectoryInfo.EnumerateFiles looks promising, except that it does not fulfill the second requirement. If I remove the performance consideration, the solution is straightforward (just get the entire collection and randomize the sequence before accessing).

Can someone suggest possible choices for C# implementation in .net 3.5/4.0 ?

Community
  • 1
  • 1
  • 5
    What is the purpose of requirement #2? If you are trying to avoid read/write contention, this seems like the worst possible way to accomplish that... – Domenic Jun 22 '11 at 05:00

1 Answers1

1

What you are asking for is impossible.

A truly "random" enumeration (in the sense that the order likely changes each time) requires a "pick without replacement" strategy. Such a strategy necessarily requires two pools: one of "chosen" files, and one of "unchosen." The "unchosen" list has to be populated before anything from it can be "chosen" randomly. This breaks your #1 requirement.

Two thoughts on how to solve your problem:

  1. What is the problem with two instances seeing the files in the same order? If it's a file locking issue, choose a read-only lock.

  2. You might be able to get away with a "holding pile" approach. Here, you would create your own enumerator class that starts by reading a small number of FileInfo records into a "Hold" collection. Then, each time your calling code requests a file, it either feeds one directly from the EnumerateFiles, or it reads one from there but swaps it out with one in your "Hold" pile and returns that one instead. The decision would be random until the EnumerateFiles returns nothing, at which point you would empty out your Hold pile. That won't provide a truly random selection order, but maybe it will add enough fuzziness to the order to meet your needs. The max size of the "Hold" collection can be adjusted to taste to balance your need for "randomness" with the need to quickly get the first file.

richardtallent
  • 34,724
  • 14
  • 83
  • 123