0

I had posted something similar to this before, but it was about dealing with the Command Prompt. As in the other instance, I'm trying to do some automated file cleanup prior to a backup in an ERP system I perform maintenance on in order to smooth out the process (as I perform maintenance on at half dozen of these systems at least twice each month). So, here are some examples of what's happening...

Here are three files names that could show up in the directory:

  • AP_AnalysisWrk.M4T
  • AP_AnalysisWrkMPM201408211313.M4T
  • AP_AnalysisWrkNG201408211313.M4T

Of these three, the second two would be candidates for deletion, and the first would need to remain. So, initially I used the following to retrieve only the second two:

String[] wrkFileList = Directory.GetFiles(directoryPath, "??_*Wrk??*????????????.M4T");

However, for some reason, it always returns all three even though the first doesn't match the pattern. When using this pattern in Windows Explorer, it only returns the second two files, as is desired. I have developed a workaround using regular expressions, which works:

Regex wrkFileMatch = new Regex("([A-z]{2}_[A-z0-9]+Wrk[A-Z0-9]{2,3}\\d{12}.(m4t|M4T))$");

I'm not crazy about this approach, though, because it adds a loop which shouldn't be necessary because I have to loop through all results to get the correct results. Performance-wise, it doesn't seem to matter that much, but I'd like to understand why the initial pattern match fails to only return the correct matches. Is there a better method for file name filtering with GetFiles, or am I just better off with iterating through the directory results and using RegEx matches to find the correct files (like I am currently doing)?

dbc
  • 104,963
  • 20
  • 228
  • 340
Michael McCauley
  • 853
  • 1
  • 12
  • 37

2 Answers2

1

From the documentation for Directory.GetFiles Method (String, String)

searchPattern can be a combination of literal and wildcard characters, but doesn't support regular expressions. The following wildcard specifiers are permitted in searchPattern.

  • * (asterisk): Zero or more characters in that position.
  • ? (question mark): Zero or one character in that position.

Given that, AP_AnalysisWrk.M4T does match ??_*Wrk??*????????????.M4T, because all those ??*???????????? characters at the end each can match the empty string.

So, you can use Directory.GetFiles() to do a crude initial match, then filter the returns more precisely with a Regex.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • Maybe the same goes for the command prompt. Serves me right for thinking Windows Explorer and C# would treat the "?" wildcard in the same fashion. I'll just revise my code to make more efficient use of RegEx. Thank you. – Michael McCauley Aug 21 '14 at 20:17
  • It's not a bad idea to do crude initial filtering with `Directory.GetFiles()`. My recollection is that the filtering is done on the server when accessing a mapped network drive, which consequently reduces network traffic somewhat scanning huge directories. – dbc Aug 21 '14 at 20:20
  • This actually gets performed directly on the server hosting the files, so there's no issue with network traffic. But, even if it were being done over a mapped drive, network traffic isn't an issue as the work is being done late at night when the network is mostly idle. I think maybe network speed would be more of an issue in this situation. In any case, the files being cleared each time number number from about 700-800 files, totaling 3GB to 10GB overall. – Michael McCauley Aug 21 '14 at 20:26
1

Your initial attempt using Directory.GetFiles(...) fails because the '?' wildcard allows for either 0 or 1 characters in the indicated position. To do what you want, you will essentially have to use a regex.

Side note, you can simplify your regex down to "\w{2}_\w+Wrk\w{2,3}\d{12}.([mM]4[tT])"

Matthew Brubaker
  • 3,097
  • 1
  • 21
  • 18