2

I've recently started working with regex in Powershell and have come across an unexpected response from the Select-String cmdlet.

If you enter something like the following:

$thing = "135" | Select-String -Pattern "(.*?)5"
$thing.Matches

You receive the expected result from the Match-Info object:

Groups   : {135, 13}
Success  : True
Captures : {135}
Index    : 0
Length   : 3
Value    : 135

But if you place the capturing group at the end of the -Pattern:

$thing = "135" | Select-String -Pattern "(.*?)"
$thing.Matches

The Match-Info doesn't seem to find anything, although one is created:

Groups   : {, }
Success  : True
Captures : {}
Index    : 0
Length   : 0
Value    : 

As I said, I'm quite new to Powershell, so I expect this behavior is operator error.

But what is the work around? This behavior hasn't caused me problems yet, but considering the files I'm working with (electronic manuals contained in XML files), I expect it will eventually.

...

With regards,

Schwert

...

Clarification:

I made my example very simple to illustrate the behavior, but my original issue was with this pattern:

$linkname = $line | Select-String -Pattern "`"na`"><!--(?<linkname>.*?)"

The file is one of our indices for the links between manuals, and the name of the link is contained within a comment block located on each line of the file.

The pattern is actually a typo, as the name and the comment don't go all the way to the end of the line. I found it when the program began giving errors when it couldn't find "linkname" in the Match-Info object.

Once I gave it the characters which occur after the link name (::), then it worked correctly. Putting it into the example:

$linkname = $line | Select-String -Pattern "`"na`"><!--(?<linkname>.*?)::"
  • 1
    You can't expect an error from `(.*?)` as it matches an empty space in the beginning of a string. What are you trying to achieve with `.*?`? – Wiktor Stribiżew Sep 30 '15 at 19:44
  • To further @stribizhev comment you can see what he means with `$thing = "135" | Select-String -Pattern "(.*?)" -AllMatches`. Why would you want to match everything if you already know it. Perhaps show us a specific example. Also could you not use PowerShell to parse the data as an `[xml]` object – Matt Sep 30 '15 at 19:46
  • To expand again, the matching string `.*?` **can** match zero characters because the asterisk matches zero or more characters, and the ? makes it non-greedy so it only captures just as many as needed to actually make a match, so that's what it's done. If you want it to capture more you will need to define what is around it (where to start or stop, such as the 5 in your example). – TheMadTechnician Sep 30 '15 at 21:32
  • @Matt; I added my original code where I found the problem. – Schwert im Stein Oct 01 '15 at 13:51
  • @TheMadTechnician: I should have realized about the ? making it non-greedy and thus finding only zero characters. As I said, operator error. But a very nice explanation as to what the code was doing. – Schwert im Stein Oct 01 '15 at 14:01
  • @stribizhev: I added my original code, which is not as simple as my examples. Your explanation is very concise, although I needed the other comments to completely understand the problem. – Schwert im Stein Oct 01 '15 at 14:31
  • I agree with the accepted answer: to parse XML, you need an XML parser, not regex. – Wiktor Stribiżew Oct 01 '15 at 14:47

1 Answers1

3

I'm no regex expert but I believe your pattern "(.*?)" is the problem. If you remove the ?, for example, you get the groups as expected.

Also, PLEASE don't use regex to parse XML. :) There's much easier ways to do that such as:

[xml]$Manual = Get-Content -Path C:\manual.xml

or

$xdoc = New-Object System.Xml.XmlDocument
$file = Resolve-Path C:\manual.xml
$xdoc.Load($file)

Once you've got it in a structured format you can then use dot notation or XPath to navigate the nodes and attributes.

Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
Adam Bertram
  • 3,858
  • 4
  • 22
  • 28
  • Like I said, very new to Powershell, although I should have realized it could handle XML files. I come from the writing side of the manuals, not a programming background. I have been working with regex to make changes in the text for quite some time, but in Notepad++ and our company's editing software. I believe this falls under the saying: "If you only have a hammer, everything looks like a nail." – Schwert im Stein Oct 01 '15 at 14:03