2

I am trying to build a simple script to utilize regex and match multiple patterns on a single line - recursively throughout an input file, and write the result to an output file. But I'm hitting a wall:

Sample text:

BMC12345 COMBINED PHASE STATISTICS:  31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345S', 0 ROWS SELECTED BUT DISCARDED DUE TBMC123456 COMBINED PHASE STATISTICS:  10 PHYSICAL (10 LOGICAL) RECORDS DISCARDED TO SYSDISC

Here's what I've got so far:

$table = [regex] "'.*'"
$discard = [regex] "\d* PHYSICAL"

Select-String -Pattern ($table, $discard) -AllMatches .\test.txt | foreach {
    $_.Matches.Value
} > output.txt

Output:

'KDDT111D.DIH0345S'

Desired output:

'KDDT111D.DIH0345S' 10 Physical

For some reason I am unable to get both patterns to write to output.txt. Ideally once I get this working I would like to use Export-Csv to get something a bit cleaner like:

|KDDT111D|DIH0345S|10 Physical|
Jonmonjovi
  • 23
  • 1
  • 7
  • Please [format your code and sample input/output properly](http://meta.stackexchange.com/a/22189/248777). – mklement0 Feb 01 '19 at 23:07

3 Answers3

1

i think you will find the -match operator a bit more suited to this. [grin] using named matches against your sample stored in $InStuff, this ...

$InStuff -match ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) \(.+"

... gives the following set of matches ...

Name                           Value                                                                              
----                           -----                                                                              
Space                          KDDT111D                                                                           
SubSpace                       DIH0345S                                                                           
Discarded                      10 PHYSICAL                                                                        
0                              BMC12345 COMBINED PHASE STATISTICS: 31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345...

the named matches can be addressed by $Matches.<the capture group name>.

Lee_Dailey
  • 7,292
  • 2
  • 22
  • 26
  • I don't believe this is the direction I want to go, but I appreciate the idea. The input file will have hundreds of records similar to the sample text I provided above. I would like to have the script pick up the input file from a specified directory. The script should then recursively extract the 'Space', 'SubSpace' and 'Discarded' match values, and write that output for each record to a txt/csv file. – Jonmonjovi Feb 01 '19 at 21:53
  • @Jonmonjovi - ok ... but regex is _usually_ the way to go for large files. [*grin*] i cannot get the Select-String cmdlet to work with an array of patterns such as you listed. i suspect it will not work at all that way. [*sigh ...*] – Lee_Dailey Feb 01 '19 at 21:58
  • Unfortunately I have been unable to replicate your results using the match operator and my input file. PS version info below: `PSVersion 5.1.17134.407 PSEdition Desktop PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...} BuildVersion 10.0.17134.407 CLRVersion 4.0.30319.42000 WSManStackVersion 3.0 PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1` – Jonmonjovi Feb 01 '19 at 22:08
  • there are so many ways a regex can go wrong [*grin*] ... it's nearly meaningless to discuss it without seeing both the actual code you used AND a few lines of the actual [sanitized] data file. – Lee_Dailey Feb 01 '19 at 22:19
1

You have run into a Select-String limitation: The .Matches property of the [Microsoft.PowerShell.Commands.MatchInfo] objects that Select-String emits for each input object (line) only ever contains the (potentially multiple) matches for the first regex passed to the
-Pattern parameter.[1]

You can work around the problem by passing a single regex instead, by combining the input regexes via alternation (|):

Select-String -Pattern ($table, $discard -join '|') -AllMatches .\test.txt | 
  ForEach-Object { $_.Matches.Value } > output.txt

A simplified example:

# ('f.', '.z' -join '|') -> 'f.|.z'
'foo bar baz' | Select-String -AllMatches ('f.', '.z' -join '|') |
  ForEach-Object { $_.Matches.Value }

The above yields:

fo
az

proving that the matches for both regexes were reported.

Caveat re output ordering: Using alternation (|) causes the matches for a given input string to be reported in the order in which they're found in the input, not in the order in which the regexes were specified.
That is, both -Pattern 'f.|.z' and -Pattern '.z|f.' above would have resulted in the same output order.


[1] The problem exists as of Windows PowerShell v5.1 / PowerShell Core 6.2.0-preview.4 and is discussed in this GitHub issue

mklement0
  • 382,024
  • 64
  • 607
  • 775
0

Thanks to the contributors for the ideas and learning experience. I was able to get the desired output utilizing a combination of both answers receive.

I found that the -match operator only returned the first occurrence of the regex pattern match from the source file, so I needed to add a foreach loop in order to recursively return matches throughout the log file.

I also modified the regex to include only discard values greater than 0.

Sample Text:

BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDAICH0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  3499604 ROWS SELECTED FOR SPACE 'KDDT000D.KDAIND0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  1 ROWS SELECTED FOR SPACE 'KDDT000D.KDCISR0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  9185775 ROWS SELECTED FOR SPACE 'KDDT000D.KDIADR0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  11 PHYSICAL (11 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDICHT0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  2387375 ROWS SELECTED FOR SPACE 'KDDT000D.KDICMS0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  1632821 ROWS SELECTED FOR SPACE 'KDDT000D.KDIPRV0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDLADD0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  24845 PHYSICAL (24845 LOGICAL) RECORDS DISCARDED TO SYSDISC

Example:

  $regex = ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) .[1-9][0-9]*\s\b"

    $timestamp = Get-Date
    $timestamp = Get-Date $timestamp -f "MM_dd_yy"
    $dir = "C:\Users\JonMonJovi\"

    cat $dir\*.log.txt | where {
        $_ -match $regex
    } | foreach {
        $Matches.Space, $Matches.SubSpace, $Matches.Discarded -join "|"
    } > C:\Users\JonMonJovi\Discarded\Discard_Log_$timestamp.txt

Output:

KDDT000D|KDIADR0S| 11 PHYSICAL
KDDT000D|KDLADD0S| 24845 PHYSICAL

From here I am able to use the pipe delimited .txt output file to import into Excel, fulfilling my requirements.

Jonmonjovi
  • 23
  • 1
  • 7