91

I have the following code:

List<string> result = new List<string>();

foreach (string file in Directory.EnumerateFiles(path,"*.*",  
      SearchOption.AllDirectories)
      .Where(s => s.EndsWith(".mp3") || s.EndsWith(".wma")))
       {
          result.Add(file);                 
       }

It works fine and does what I need. Except for one small thing. I would like to find a better way to filter on multiple extensions. I would like to use a string array with filters such as this:

string[] extensions = { "*.mp3", "*.wma", "*.mp4", "*.wav" };

What is the most efficient way to do this using NET Framework 4.0/LINQ? Any suggestions?

I'd appreciate any help being an occasional programmer :-)

H H
  • 263,252
  • 30
  • 330
  • 514
Yooakim
  • 1,717
  • 2
  • 13
  • 21
  • You should consider running each extension search in parallel. I created some useful helper methods in my answer. One which takes a regexp, and one which takes a string list. – Mikael Svenson Sep 20 '10 at 18:51
  • 2
    This is a *very* old question (already suitably answered by @MikaelSvenson ), but another option is to use the Enumerable extension .Union(), like so: foreach (var file in Directory.EnumerateFiles(path, "*.mp3", SearchOption.AllDirectories).Union(Directory.EnumerateFiles(path, "*.wma", SearchOption.AllDirectories)) { ... } – Kirkaiya Mar 24 '15 at 20:44

8 Answers8

93

I created some helper methods to solve this which I blogged about earlier this year.

One version takes a regex pattern \.mp3|\.mp4, and the other a string list and runs in parallel.

public static class MyDirectory
{   // Regex version
   public static IEnumerable<string> GetFiles(string path, 
                       string searchPatternExpression = "",
                       SearchOption searchOption = SearchOption.TopDirectoryOnly)
   {
      Regex reSearchPattern = new Regex(searchPatternExpression, RegexOptions.IgnoreCase);
      return Directory.EnumerateFiles(path, "*", searchOption)
                      .Where(file =>
                               reSearchPattern.IsMatch(Path.GetExtension(file)));
   }

   // Takes same patterns, and executes in parallel
   public static IEnumerable<string> GetFiles(string path, 
                       string[] searchPatterns, 
                       SearchOption searchOption = SearchOption.TopDirectoryOnly)
   {
      return searchPatterns.AsParallel()
             .SelectMany(searchPattern => 
                    Directory.EnumerateFiles(path, searchPattern, searchOption));
   }
}
Mikael Svenson
  • 39,181
  • 7
  • 73
  • 79
  • Thanks for a good implementation. What can be a good(efficient) way to finally show the results on WPF screen? I plan to use your parallel method to get files. What if i use foreach to iterate the results and store them in a List, and them load them on screen? – Saurabh Kumar Nov 04 '12 at 08:06
  • You can just bind to the output of either methods as the binding will enumerate all results for you. No need to store it in a separate list first. The most efficient way is to start displaying items as they are enumerated. I'm no WPF expert, but I guess you should be able to render per item with some signalling. – Mikael Svenson Nov 04 '12 at 12:56
  • Great examples! Just to note a few characteristics of each of the two methods... With the `PARALLEL` method, Searches are NOT case sensitive, and the results you will get is going to be out of order. With the `REGEX` method, Searches ARE case sensitive (unless you use something like `"(?i)\.mp3$|\.mp4$"`), and the results you will get will be in order as you would expect. I have ran tests and noticed that the parallel version might run a SLIGHT bit faster but all and all it a VERY small difference. – Arvo Bowen Oct 19 '15 at 15:56
  • @ArvoBowen good catch on the case sensitive comparison, and added a regexoption in the code – Mikael Svenson Oct 20 '15 at 06:37
  • This is a great solution; thanks! Just an FYI: I ran into performance issues that I traced back to IEnumerable (mostly centered around the Count( ) method which I used in a few places, but that was not the only performance hit). My list had about 4700 file names. I wound up doing a .ToArray( ) on the list and dealing with everything as an array; you pay a one-time price turning the list into an array but it is more than mitigated by the noticeably faster performance thereafter. – Toolsmythe Jan 14 '16 at 22:25
  • @Toolsmythe If you call Count() it will iterate everything first.. if you only need to iterate and handle items as they are encountered, then there is no need to do ToArray first..or shouldn't anyways. Haven't looked at this in years :) – Mikael Svenson Jan 15 '16 at 07:23
  • I assume this does not handle the case where a pattern is a subset of another pattern? E.g., consider a split ZIP file. One might want to match `*.z*` (= a part of a split zip file) and `*.zip`. Then, instead of having as a result `[ 'foo.zip' ]`, your code [afaik] return `[ 'foo.zip', 'foo.zip' ]`. – Sebastian Mach Jan 03 '18 at 09:16
  • @SebastianMach you could use the regexp overload with `\.z{\d}+|\.zip` which would grab zip + z00 etc. – Mikael Svenson Jan 03 '18 at 13:16
  • @MikaelSvenson: Yes; it's too bad one looses the ability to let the OS decide how to best handle pattern matching, but enumerate-all (whether regular expressions or wildcards) seems to be the only way to go without requiring addition storage for duplicate detection. If one still goes with a hash-set to store duplicates, she/he wants to make sure to detect duplicates _correctly_, which can become quite a quest (think junctions or sym- and hard links, UNC syntax, casing differences (in turn possibly opening up I18N issues), ...). I was in exactly this situation today (splitting wildcards at – Sebastian Mach Jan 03 '18 at 13:50
  • ue> "|" though), and conclude it's best to just go KISS and iterate the list for `*` once and then filter. – Sebastian Mach Jan 03 '18 at 13:50
  • Is it possible to include multiple paths as well? or would it be better to put a for loop around GetFiles to process multiple paths? – Rod May 15 '19 at 20:40
  • Multiple paths would be to join two result sets. – Mikael Svenson May 16 '19 at 05:54
  • Can this be extended one step farther, and search the first instance of a search text in the specified file extensions? – Rod May 24 '19 at 19:11
  • @Rod can you explain a bit more? And most likely a developer can make anything work – Mikael Svenson May 25 '19 at 07:03
  • Your code above returns files with multiple extensions. Let’s say those were text files instead of music. (.xml, .json, .config) In addition to only returning those extensions I only want to return the files that contain a certain string. Do I need to create a 2nd loop and search those files? – Rod May 25 '19 at 14:21
  • Correct, you would need to read the file content as well. An awesome Windows utility for this is Agent Ransack at https://www.mythicsoft.com/agentransack/ which has been my trusted find in file tool for as long as it has existed almost (first version in 2000) – Mikael Svenson May 25 '19 at 21:14
32

The most elegant approach is probably:

var directory = new DirectoryInfo(path);
var masks = new[] { "*.mp3", "*.wav" };
var files = masks.SelectMany(directory.EnumerateFiles);

But it might not be the most efficient.

Tom Pažourek
  • 9,582
  • 8
  • 66
  • 107
22
string path = "C:\\";
var result = new List<string>();
string[] extensions = { ".mp3", ".wma", ".mp4", ".wav" };

foreach (string file in Directory.EnumerateFiles(path, "*.*", SearchOption.AllDirectories)
    .Where(s => extensions.Any(ext => ext == Path.GetExtension(s))))
{
    result.Add(file);
    Console.WriteLine(file);
}
davmos
  • 9,324
  • 4
  • 40
  • 43
Islam Yahiatene
  • 1,441
  • 14
  • 27
16

As I noted in a comment, while Mikael Svenson's helper methods are great little solutions, if you're ever trying to do something for a one-off project in a hurry again, consider the Linq extension .Union( ). This allows you to join together two enumerable sequences. In your case, the code would look like this:

List<string> result = Directory.EnumerateFiles(path,"*.mp3", SearchOption.AllDirectories)
.Union(Directory.EnumerateFiles(path, ".wma", SearchOption.AllDirectories)).ToList();

This creates and fills your result list all in one line.

Kirkaiya
  • 1,145
  • 11
  • 14
  • 4
    Elegant, and avoids enumeration of all files by C#, allowing the file system to optimize however it can. – Craig Brunetti Nov 13 '17 at 16:25
  • 4
    Probably it's slower to do two(or more) searches with extension pattern each than do a generic search returning all files and filter by extension in C#. Would be interesting to benchmark though. – Alex P. Jan 06 '23 at 22:53
4

I solved this problem this way:

string[] formats = {".mp3", ".wma", ".mp4"};

foreach (var file in Directory.EnumerateFiles(folder, "*.*", SearchOption.AllDirectories).Where(x => formats.Any(x.EndsWith)))
{
    // TODO...
}
2

I know this is an old post but I came up with a solution people might like to use.

private IEnumerable<FileInfo> FindFiles()
{
    DirectoryInfo sourceDirectory = new DirectoryInfo(@"C:\temp\mydirectory");
    string foldersFilter = "*bin*,*obj*";
    string fileTypesFilter = "*.mp3,*.wma,*.mp4,*.wav";

    // filter by folder name and extension
    IEnumerable<DirectoryInfo> directories = foldersFilter.Split(',').SelectMany(pattern => sourceDirectory.EnumerateDirectories(pattern, SearchOption.AllDirectories));
    List<FileInfo> files = new List<FileInfo>();
    files.AddRange(directories.SelectMany(dir => fileTypesFilter.Split(',').SelectMany(pattern => dir.EnumerateFiles(pattern, SearchOption.AllDirectories))));

    // Pick up root files
    files.AddRange(fileTypesFilter.Split(',').SelectMany(pattern => sourceDirectory.EnumerateFiles(fileTypesFilter, SearchOption.TopDirectoryOnly)));

    // filter just by extension
    IEnumerable<FileInfo> files2 = fileTypesFilter.Split(',').SelectMany(pattern => sourceDirectory.EnumerateFiles(pattern, SearchOption.AllDirectories));
}
davmos
  • 9,324
  • 4
  • 40
  • 43
kmcbrearty
  • 71
  • 1
  • 6
1

For Filtering using the same File Extensions list strings as GUI Open Dialogs e.g.:

".exe,.pdb".Split(',', ';', '|').SelectMany(_ => Directory.EnumerateFiles(".", "*" + _, searchOptions)

Packaged up:

    public static IEnumerable<string> EnumerateFilesFilter(string path, string filesFilter, SearchOption searchOption = SearchOption.TopDirectoryOnly)
    {
        return filesFilter.Split(',', ';', '|').SelectMany(_ => Directory.EnumerateFiles(path, "*" + _, searchOption));
    }
0

Beginning from the NET Core 2.1 and .NET Standard 2.1 there is built-in class FileSystemName: documentation, source code which provides methods for matching file system names:

Example:

public static IEnumerable<string> EnumerateFiles(string path, string[] searchPatterns, SearchOption searchOption = SearchOption.TopDirectoryOnly)
{
    return Directory.EnumerateFiles(path, "*", searchOption)
                    .Where(fileName => searchPatterns.Any(pattern => FileSystemName.MatchesSimpleExpression(pattern, fileName)));
}

I've adapted the existing source code of FileSystemName to be used in .NetFramework 4: Gist FileSystemName for .NetFramework 4.

Yevhen Cherkes
  • 626
  • 7
  • 10