7

Say I have a regular expression like the following, but I loaded it from a file into a variable $regex, and so have no idea at design time what its contents are, but at runtime I can discover that it includes the "version1", "version2", "version3" and "version4" named groups:

"Version (?<version1>\d),(?<version2>\d),(?<version3>\d),(?<version4>\d)"

...and I have these variables:

$version1 = "3"
$version2 = "2"
$version3 = "1"
$version4 = "0"

...and I come across the following string in a file:

Version 7,7,0,0

...which is stored in a variable $input, so that ($input -match $regex) evaluates to $true.

How can I replace the named groups from $regex in the string $input with the values of $version1, $version2, $version3, $version4 if I do not know the order in which they appear in $regex (I only know that $regex includes these named groups)?

I can't find any references describing the syntax for replacing a named group with the value of a variable by using the group name as an index to the match - is this even supported?

EDIT: To clarify - the goal is to replace templated version strings in any kind of text file where the version string in a given file requires replacement of a variable number of version fields (could be 2, 3, or all 4 fields). For example, the text in a file could look like any of these (but is not restricted to these):

#define SOME_MACRO(4, 1, 0, 0)

Version "1.2.3.4"

SomeStruct vs = { 99,99,99,99 }

Users can specify a file set and a regular expression to match the line containing the fields, with the original idea being that the individual fields would be captured by named groups. The utility has the individual version field values that should be substituted in the file, but has to preserve the original format of the line that will contain the substitutions, and substitute only the requested fields.

EDIT-2: I think I can get the result I need with substring calculations based on the position and extent of each of the matches, but was hoping Powershell's replace operation was going to save me some work.

EDIT-3: So, as Ansgar correctly and succinctly describes below, there isn't a way (using only the original input string, a regular expression about which you only know the named groups, and the resulting matches) to use the "-replace" operation (or other regex operations) to perform substitutions of the captures of the named groups, while leaving the rest of the original string intact. For this problem, if anybody's curious, I ended up using the solution below. YMMV, other solutions possible. Many thanks to Ansgar for his feedback and options provided.

In the following code block:

  • $input is a line of text on which substitution is to be performed
  • $regex is a regular expression (of type [string]) read from a file that has been verified to contain at least one of the supported named groups
  • $regexToGroupName is a hash table that maps a regex string to an array of group names ordered according to the order of the array returned by [regex]::GetGroupNames(), which matches the left-to-right order in which they appear in the expression
  • $groupNameToVersionNumber is a hash table that maps a group name to a version number.

Constraints on the named groups within $regex are only (I think) that the expression within the named groups cannot be nested, and should match at most once within the input string.

# This will give us the index and extent of each substring
# that we will be replacing (the parts that we will not keep)
$matchResults = ([regex]$regex).match($input)

# This will hold substrings from $input that were not captured
# by any of the supported named groups, as well as the replacement
# version strings, properly ordered, but will omit substrings captured
# by the named groups
$lineParts = @()
$startingIndex = 0
foreach ($groupName in $regexToGroupName.$regex)
{
    # Excise the substring leading up to the match for this group...
    $lineParts = $lineParts + $input.Substring($startingIndex, $matchResults.groups[$groupName].Index - $startingIndex)

    # Instead of the matched substring, we'll use the substitution
    $lineParts = $lineParts + $groupNameToVersionNumber.$groupName

    # Set the starting index of the next substring that we will keep...
    $startingIndex = $matchResults.groups[$groupName].Index + $matchResults.groups[$groupName].Length
}

# Keep the end of the original string (if there's anything left)
$lineParts = $lineParts + $input.Substring($startingIndex, $input.Length - $startingIndex)

$newLine = ""
foreach ($part in $lineParts)
{
   $newLine = $newLine + $part
}
$input= $newLine
ekad
  • 14,436
  • 26
  • 44
  • 46
Hoobajoob
  • 2,748
  • 3
  • 28
  • 33

2 Answers2

7

Simple Solution

In the scenario where you simply want to replace a version number found somewhere in your $input text, you could simply do this:

$input -replace '(Version\s+)\d+,\d+,\d+,\d+',"`$1$Version1,$Version2,$Version3,$Version4"

Using Named Captures in PowerShell

Regarding your question about named captures, that can be done by using curly brackets. i.e.

'dogcatcher' -replace '(?<pet>dog|cat)','I have a pet ${pet}.  '

Gives:

I have a pet dog.  I have a pet cat.  cher

Issue with multiple captures & solution

You can't replace multiple values in the same replace statement, since the replacement string is used for everything. i.e. if you did this:

 'dogcatcher' -replace '(?<pet>dog|cat)|(?<singer>cher)','I have a pet ${pet}.  I like ${singer}''s songs.  '

You'd get:

I have a pet dog.  I like 's songs.  I have a pet cat.  I like 's songs.  I have a pet .  I like cher's songs.  

...which is probably not what you're hoping for.

Rather, you'd have to do a match per item:

'dogcatcher' -replace '(?<pet>dog|cat)','I have a pet ${pet}.  ' -replace '(?<singer>cher)', 'I like ${singer}''s songs.  ' 

...to get:

I have a pet dog.  I have a pet cat.  I like cher's songs.  

More Complex Solution

Bringing this back to your scenario, you're not actually using the captured values; rather you're hoping to replace the spaces they were in with new values. For that, you'd simply want this:

$input = 'I''m running Programmer''s Notepad version 2.4.2.1440, and am a big fan.  I also have Chrome v    56.0.2924.87 (64-bit).' 

$version1 = 1
$version2 = 3
$version3 = 5
$version4 = 7

$v1Pattern = '(?<=\bv(?:ersion)?\s+)\d+(?=\.\d+\.\d+\.\d+)'
$v2Pattern = '(?<=\bv(?:ersion)?\s+\d+\.)\d+(?=\.\d+\.\d+)'
$v3Pattern = '(?<=\bv(?:ersion)?\s+\d+\.\d+\.)\d+(?=\.\d+)'
$v4Pattern = '(?<=\bv(?:ersion)?\s+\d+\.\d+\.\d+\.)\d+'

$input -replace $v1Pattern, $version1 -replace $v2Pattern, $version2 -replace $v3Pattern,$version3 -replace $v4Pattern,$version4

Which would give:

I'm running Programmer's Notepad version 1.3.5.7, and am a big fan.  I also have Chrome v    1.3.5.7 (64-bit).

NB: The above could be written as a 1 liner, but I've broken it down to make it simpler to read.

This takes advantage of regex lookarounds; a way of checking the content before and after the string you're capturing, without including those in the match. i.e. so when we select what to replace we can say "match the number that appears after the word version" without saying "replace the word version".

More info on those here: http://www.regular-expressions.info/lookaround.html

Your Example

Adapting the above to work for your example (i.e. where versions may be separated by commas or dots, and there's no consistency to their format beyond being 4 sets of numbers:

$input = @'
#define SOME_MACRO(4, 1, 0, 0)

Version "1.2.3.4"

SomeStruct vs = { 99,99,99,99 }
'@

$version1 = 1
$version2 = 3
$version3 = 5
$version4 = 7

$v1Pattern = '(?<=\b)\d+(?=\s*[\.,]\s*\d+\s*[\.,]\s*\d+\s*[\.,]\s*\d+\b)'
$v2Pattern = '(?<=\b\d+\s*[\.,]\s*)\d+(?=\s*[\.,]\s*\d+\s*[\.,]\s*\d+\b)'
$v3Pattern = '(?<=\b\d+\s*[\.,]\s*\d+\s*[\.,]\s*)\d+(?=\s*[\.,]\s*\d+\b)'
$v4Pattern = '(?<=\b\d+\s*[\.,]\s*\d+\s*[\.,]\s*\d+\s*[\.,]\s*)\d+\b'

$input -replace $v1Pattern, $version1 -replace $v2Pattern, $version2 -replace $v3Pattern,$version3 -replace $v4Pattern,$version4

Gives:

#define SOME_MACRO(1, 3, 5, 7)

Version "1.3.5.7"

SomeStruct vs = { 1,3,5,7 }
JohnLBevan
  • 22,735
  • 13
  • 96
  • 178
4

Regular expressions don't work that way, so you can't. Not directly, that is. What you can do (short of using a more appropriate regular expression that groups the parts you want to keep) is to extract the version string and then in a second step replace that substring with the new version string:

$oldver = $input -replace $regexp, '$1,$2,$3,$4'
$newver = $input -replace $oldver, "$Version1,$Version2,$Version3,$Version4"

Edit:

If you don't even know the structure, you must extract that from the regular expression as well.

$version = @($version1, $version2, $version3, $version4)
$input -match $regexp
$oldver = $regexp
$newver = $regexp
for ($i = 1; $i -le 4; $i++) {
  $oldver = $oldver -replace "\(\?<version$i>\\d\)", $matches["version$i"]
  $newver = $newver -replace "\(\?<version$i>\\d\)", $version[$i-1]
}
$input -replace $oldver, $newver
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
  • Agreed that this would be nice, but this is for a utility where users specify a regex and a file set. I don't know the regex, and I don't know what the file contents look like, so I couldn't use the first line in your answer without reformatting the original file contents, which would be undesirable. I have to leave the file contents looking the same afterwards, replacing only the substrings on the matching lines with the individual version fields. – Hoobajoob Sep 01 '12 at 15:18
  • Perhaps you can replace the named groups in the regular expression with the actual old/new numbers and then do a string replace. That won't work correctly if the regular expression contains expressions other than the named groups, though. – Ansgar Wiechers Sep 01 '12 at 18:17
  • This nearly works, though I don't know in advance what how the named groups in the regex are actually defined (e.g., they could be looking for \d, \d{2}, \d+, a literal, etc.). I can introduce some constraints on the named group definition and change the regex used in the for loop you have above to admit one or more characters from the regex syntax as well as alphanumeric (e.g., replace the "\\d" in the regex within the for loops with "[a-zA-Z0-9\\+\.\*\?\^\$\{\}\|\[\]]+"). At any rate, this approach is preferable to substring operations. – Hoobajoob Sep 05 '12 at 18:03
  • An additional problem is if the string to be matched includes one or more regex characters outside of the group definition, but which are required to match the string. For example: Version\0,0,0,0 - the regex for this would be "Version\\\(?\d),(?\d),0,0", but using the above algorithm the final replaced string would be "Version\\1,2,0,0" rather than "Version\1,2,0,0". – Hoobajoob Sep 05 '12 at 19:49
  • Why do you think I told you in advance that it would not work if the regular expression contained other expressions as well? It's not feasible (if not downright impossible) to handle every possible regular expression your users could come up with. – Ansgar Wiechers Sep 05 '12 at 22:35
  • Doh! Sorry - missed that in your comment above. – Hoobajoob Sep 06 '12 at 14:24