35

I've got a wildcard pattern, perhaps "*.txt" or "POS??.dat".

I also have list of filenames in memory that I need to compare to that pattern.

How would I do that, keeping in mind I need exactly the same semantics that IO.DirectoryInfo.GetFiles(pattern) uses.

EDIT: Blindly translating this into a regex will NOT work.

Chad Birch
  • 73,098
  • 23
  • 151
  • 149
Jonathan Allen
  • 68,373
  • 70
  • 259
  • 447
  • For anyone who comes across this question now that it is years later, I found over at the MSDN social boards that the GetFiles() method will accept * and ? wildcard characters in the searchPattern parameter. (At least in .Net 3.5, 4.0, and 4.5) Directory.GetFiles(string path, string searchPattern) http://msdn.microsoft.com/en-us/library/wz42302f.aspx – jgerman Apr 29 '16 at 16:42

9 Answers9

51

I have a complete answer in code for you that's 95% like FindFiles(string).

The 5% that isn't there is the short names/long names behavior in the second note on the MSDN documentation for this function.

If you would still like to get that behavior, you'll have to complete a computation of the short name of each string you have in the input array, and then add the long name to the collection of matches if either the long or short name matches the pattern.

Here is the code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace FindFilesRegEx
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] names = { "hello.t", "HelLo.tx", "HeLLo.txt", "HeLLo.txtsjfhs", "HeLLo.tx.sdj", "hAlLo20984.txt" };
            string[] matches;
            matches = FindFilesEmulator("hello.tx", names);
            matches = FindFilesEmulator("H*o*.???", names);
            matches = FindFilesEmulator("hello.txt", names);
            matches = FindFilesEmulator("lskfjd30", names);
        }

        public string[] FindFilesEmulator(string pattern, string[] names)
        {
            List<string> matches = new List<string>();
            Regex regex = FindFilesPatternToRegex.Convert(pattern);
            foreach (string s in names)
            {
                if (regex.IsMatch(s))
                {
                    matches.Add(s);
                }
            }
            return matches.ToArray();
        }

        internal static class FindFilesPatternToRegex
        {
            private static Regex HasQuestionMarkRegEx   = new Regex(@"\?", RegexOptions.Compiled);
            private static Regex IllegalCharactersRegex  = new Regex("[" + @"\/:<>|" + "\"]", RegexOptions.Compiled);
            private static Regex CatchExtentionRegex    = new Regex(@"^\s*.+\.([^\.]+)\s*$", RegexOptions.Compiled);
            private static string NonDotCharacters      = @"[^.]*";
            public static Regex Convert(string pattern)
            {
                if (pattern == null)
                {
                    throw new ArgumentNullException();
                }
                pattern = pattern.Trim();
                if (pattern.Length == 0)
                {
                    throw new ArgumentException("Pattern is empty.");
                }
                if(IllegalCharactersRegex.IsMatch(pattern))
                {
                    throw new ArgumentException("Pattern contains illegal characters.");
                }
                bool hasExtension = CatchExtentionRegex.IsMatch(pattern);
                bool matchExact = false;
                if (HasQuestionMarkRegEx.IsMatch(pattern))
                {
                    matchExact = true;
                }
                else if(hasExtension)
                {
                    matchExact = CatchExtentionRegex.Match(pattern).Groups[1].Length != 3;
                }
                string regexString = Regex.Escape(pattern);
                regexString = "^" + Regex.Replace(regexString, @"\\\*", ".*");
                regexString = Regex.Replace(regexString, @"\\\?", ".");
                if(!matchExact && hasExtension)
                {
                    regexString += NonDotCharacters;
                }
                regexString += "$";
                Regex regex = new Regex(regexString, RegexOptions.Compiled | RegexOptions.IgnoreCase);
                return regex;
            }
        }
    }
}
Nate Diamond
  • 5,525
  • 2
  • 31
  • 57
sprite
  • 3,724
  • 3
  • 28
  • 30
  • 1
    Great piece of code. The HasAsteriskRegex variable is never used. – Dor Rotman Nov 03 '11 at 17:06
  • @Dor Rotman, Thanks... I edited the code accordingly. I probably initially thought I'd have to check the asterisk for pattern correctness and forgot to remove the unused RegEx later on. – sprite Nov 13 '11 at 06:40
  • *.txt matches HeLLo.txtsjfhs which is wrong in my opinion, kewljibin's answer works properly for *.txt but doesn't deal with upper/lower case – Graham Sep 13 '12 at 14:39
  • @Graham Open a command window and try it, I just did. dir *.txt returns both "HeLLo.txt" and "HeLLo.txtsjfhs". Also, if you look at the link in my post, you will see the behavior defined in the DirectoryInfo.GetFiles Method in MSDN is this exact behavior. There's a lengthy comment there and some samples. Here they are: "*.abc" returns files having an extension of.abc,.abcd,.abcde,.abcdef, and so on. "*.abcd" returns only files having an extension of.abcd. "*.abcde" returns only files having an extension of.abcde. "*.abcdef" returns only files having an extension of.abcdef. – sprite Sep 19 '12 at 14:28
  • @Grahm kewljibin's answer handles the input different than what the question requested. The desired behavior is the same as DirectoryInfo.GetFiles Method. – sprite Sep 19 '12 at 14:32
  • This does not match the `DirectoryInfo.GetFiles` method. Given `file1.txt`, it will match `file1.txtother` when it should not. – Nick Whaley Jan 18 '13 at 19:47
  • Also, undocumented by Microsoft but `DirectoryInfo.GetFiles` will match `*.*` against all files even with no extensions (and no dot in the name at all). This function will not match those. – Nick Whaley Jan 18 '13 at 19:52
  • Your function does not consider the \ character invalid, although it looks like that is your intent. The \ is not escaped in the regex. Use the pattern `@"[\\/:<>|""]"` for invalid characters. – Nick Whaley Jan 18 '13 at 20:44
  • I ran a test of this code by creating files with the names given in a test directory. I then ran `dir ` where the file names were the same names in the `names` array and the patterns were the same strings passed to the `FindFilesEmulator` function. The results I got back from the `dir` commands were not the same results I got back from the `FindFilesEmulator` function. – Tony Vitabile Jan 18 '13 at 20:46
  • Hi @TonyVitabile, I actually tried that and saw the same results. Were you using short names? Because that's something I didn't cover in my solution (only commented on what's missing for that). What did you get different and in what command shell (the standard cmd or something else)? – sprite Jan 20 '13 at 10:13
  • @sprite: My drive is formatted with NTFS, no short names. I used cmd.exe and I got 6 files back. That is, every name in the test program matched, while the test program only returned 3 names. It's because DOS matches extensions shorter & longer than 3 characters long when the extension .??? is used. – Tony Vitabile Jan 22 '13 at 17:24
  • @sprite: Correction, the problem wasn't that the extension was .???; it was that an asterisk was in the file name. However, if you do `dir hello.???`, you get three results back: hello.t, HelLo.tx, and HeLLo.txt. – Tony Vitabile Jan 23 '13 at 13:36
  • `public string[] FindFilesEmulator` should be `static` in order for this code to work. – sɐunıɔןɐqɐp May 14 '19 at 10:12
20

You can simply do this. You do not need regular expressions.

using Microsoft.VisualBasic.CompilerServices;

if (Operators.LikeString("pos123.txt", "pos?23.*", CompareMethod.Text))
{
  Console.WriteLine("Filename matches pattern");
}

Or, in VB.Net,

If "pos123.txt" Like "pos?23.*" Then
  Console.WriteLine("Filename matches pattern")
End If

In c# you could simulate this with an extension method. It wouldn't be exactly like VB Like, but it would be like...very cool.

toddmo
  • 20,682
  • 14
  • 97
  • 107
  • 1
    Or...if you're WRITING in VB.NET, you can use the `like` operator directly! I've been writing VB for over 15 years now, and I've never used the `like` operator. **IF** I ever knew the operator existed, I've never had the need to use it. I plan to use it now. – CrazyIvan1974 Apr 05 '16 at 02:44
  • 1
    'Operators' does not contain a definition for 'LikeString' – Alex Blokha Dec 18 '20 at 12:54
  • @AlexBlokha Did you add a reference to Microsoft.VisualBasic? Then it works. – Andrew Morton Dec 18 '20 at 17:11
6

Just call the Windows API function PathMatchSpecExW().

[Flags]
public enum MatchPatternFlags : uint
{
    Normal          = 0x00000000,   // PMSF_NORMAL
    Multiple        = 0x00000001,   // PMSF_MULTIPLE
    DontStripSpaces = 0x00010000    // PMSF_DONT_STRIP_SPACES
}

class FileName
{
    [DllImport("Shlwapi.dll", SetLastError = false)]
    static extern int PathMatchSpecExW([MarshalAs(UnmanagedType.LPWStr)] string file,
                                       [MarshalAs(UnmanagedType.LPWStr)] string spec,
                                       MatchPatternFlags flags);

    /*******************************************************************************
    * Function:     MatchPattern
    *
    * Description:  Matches a file name against one or more file name patterns.
    *
    * Arguments:    file - File name to check
    *               spec - Name pattern(s) to search foe
    *               flags - Flags to modify search condition (MatchPatternFlags)
    *
    * Return value: Returns true if name matches the pattern.
    *******************************************************************************/

    public static bool MatchPattern(string file, string spec, MatchPatternFlags flags)
    {
        if (String.IsNullOrEmpty(file))
            return false;

        if (String.IsNullOrEmpty(spec))
            return true;

        int result = PathMatchSpecExW(file, spec, flags);

        return (result == 0);
    }
}
Bruce Engle
  • 61
  • 1
  • 1
5

You could translate the wildcards into a regular expression:

*.txt -> ^.+\.txt$

POS??.dat _> ^POS..\.dat$

Use the Regex.Escape method to escape the characters that are not wildcars into literal strings for the pattern (e.g. converting ".txt" to "\.txt").

The wildcard * translates into .+, and ? translates into .

Put ^ at the beginning of the pattern to match the beginning of the string, and $ at the end to match the end of the string.

Now you can use the Regex.IsMatch method to check if a file name matches the pattern.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • -1 because this answer is just flat wrong. It almost works for the two examples posted, except you need to make sure that the regex is made case-insensitive. But the behavior of GetFiles is rather complex. See http://msdn.microsoft.com/en-us/library/8he88b63.aspx for details. – Jim Mischel Mar 16 '09 at 21:01
  • Thanks for the attempt, but like I said it needs to match GetFiles exactly and this won't. – Jonathan Allen Mar 17 '09 at 17:36
  • Well, it's not possible to get the same behaviour as GetFiles with a list of file names, as it's impossible to know what the short file names were. – Guffa Mar 17 '09 at 21:39
  • 1
    -1 for incorrectness. Sorry but Jim is right. The case insensitivity is the small problem, it doesn't behave like GetFiles(string) regarding the presence or not of '*' or '?' characters in the pattern. Read the MSDN article that Jim poster in his comment. – sprite Dec 07 '10 at 10:03
  • you can get pretty close to it with everything except the short names. You could even get close to the short name behavior if you make some assumptions. For example, assuming that the names of the files should be converted to short names by their order in the input you can emulate the behavior. e.g. if you have longfilename.txt and longfileothername.txt then they would be longfi~1.txt and longfi~2.txt in respect to the order in the array. If you look at my code sample below I gave a solution with all but short names (and a comment on what's missing to emulate it). – sprite Dec 07 '10 at 10:08
  • Why the downvote? If you don't explain what it is that you think is wrong, it can't improve the answer. – Guffa Nov 19 '14 at 16:21
2

For anyone who comes across this question now that it is years later, I found over at the MSDN social boards that the GetFiles() method will accept * and ? wildcard characters in the searchPattern parameter. (At least in .Net 3.5, 4.0, and 4.5)

Directory.GetFiles(string path, string searchPattern)

http://msdn.microsoft.com/en-us/library/wz42302f.aspx

jgerman
  • 1,143
  • 8
  • 14
  • 4
    Sorry but -1 as you did not answer the question at all. – sprite Feb 25 '16 at 06:21
  • 3
    I answered a limitation that was stated in the question, saying it is no longer a limitation. Why spend your own reputation points to penalize someone who was updating an old question with new information but who did not yet have enough reputation to edit the original question when posting the update? – jgerman Feb 26 '16 at 15:38
  • 2
    Because it isn't an answer, what you wrote should be a comment on the original question. The person who posted or a moderator can then put in an edit. It's not personal. As for "wasting" my reputation, I don't care for the number much, I'm not in a competition with anyone. – sprite Apr 24 '16 at 05:16
2

Some kind of regex/glob is the way to go, but there are some subtleties; your question indicates you want identical semantics to IO.DirectoryInfo.GetFiles. That could be a challenge, because of the special cases involving 8.3 vs. long file names and the like. The whole story is on MSDN.

If you don't need an exact behavioral match, there are a couple of good SO questions:

glob pattern matching in .NET
How to implement glob in C#

Community
  • 1
  • 1
zweiterlinde
  • 14,557
  • 2
  • 27
  • 32
0

Plz try the below code.

static void Main(string[] args)
    {
        string _wildCardPattern = "*.txt";

        List<string> _fileNames = new List<string>();
        _fileNames.Add("text_file.txt");
        _fileNames.Add("csv_file.csv");

        Console.WriteLine("\nFilenames that matches [{0}] pattern are : ", _wildCardPattern);
        foreach (string _fileName in _fileNames)
        {
            CustomWildCardPattern _patetrn = new CustomWildCardPattern(_wildCardPattern);
            if (_patetrn.IsMatch(_fileName))
            {
                Console.WriteLine("{0}", _fileName);
            }
        }

    }

public class CustomWildCardPattern : Regex
{
    public CustomWildCardPattern(string wildCardPattern)
        : base(WildcardPatternToRegex(wildCardPattern))
    {
    }

    public CustomWildCardPattern(string wildcardPattern, RegexOptions regexOptions)
        : base(WildcardPatternToRegex(wildcardPattern), regexOptions)
    {
    }

    private static string WildcardPatternToRegex(string wildcardPattern)
    {
        string patternWithWildcards = "^" + Regex.Escape(wildcardPattern).Replace("\\*", ".*");
        patternWithWildcards = patternWithWildcards.Replace("\\?", ".") + "$";
        return patternWithWildcards;
    }
}
kewljibin
  • 62
  • 2
  • This one works better than sprite's answer on *.txt, but this one doesn't take into account upper and lower case and didn't like *.txt? – Graham Sep 13 '12 at 14:32
  • 1
    This behavior doesn't match the one in the question. It was requested to match DirectoryInfo.GetFiles Method. – sprite Sep 19 '12 at 14:49
0

For searching against a specific pattern, it might be worth using File Globbing which allows you to use search patterns like you would in a .gitignore file.

See here: https://learn.microsoft.com/en-us/dotnet/core/extensions/file-globbing

This allows you to add both inclusions & exclusions to your search.

Please see below the example code snippet from the Microsoft Source above:

Matcher matcher = new Matcher();
matcher.AddIncludePatterns(new[] { "*.txt" });

IEnumerable<string> matchingFiles = matcher.GetResultsInFullPath(filepath);
Ryan Gaudion
  • 695
  • 1
  • 8
  • 22
-2

The use of RegexOptions.IgnoreCase will fix it.

public class WildcardPattern : Regex {
    public WildcardPattern(string wildCardPattern)
        : base(ConvertPatternToRegex(wildCardPattern), RegexOptions.IgnoreCase) {
    }

    public WildcardPattern(string wildcardPattern, RegexOptions regexOptions)
        : base(ConvertPatternToRegex(wildcardPattern), regexOptions) {
    }

    private static string ConvertPatternToRegex(string wildcardPattern) {
        string patternWithWildcards = Regex.Escape(wildcardPattern).Replace("\\*", ".*");
        patternWithWildcards = string.Concat("^", patternWithWildcards.Replace("\\?", "."), "$");
        return patternWithWildcards;
    }
}