0

Is there a way to use Select-String to find all lines between X and Y.

e.g. if I have a file with content:

[line 157: Time 2015-08-04 11:34:00] 
<staff>
    <employee>
        <Name>Bob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Sam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Mark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>

and I want to find all the content for < function >management< /function >, so I would end up with:

<employee>
    <Name>Bob Smith</Name>
    <function>management</function>
    <age>39</age>
    <birthday>3rd June</birthday>
    <car>yes</car>
</employee>
<employee>
    <Name>Mark Perkins</Name>
    <function>management</function>
    <age>32</age>
</employee>    

If all groupings were the same size I could use something like:

Select-String -Pattern '<function>management</function>' -CaseSensitive -Context 2,2

However, in reality they are not going to be the same size, so I can't use a fixed number each time.

Really I need a way of saying return everything that is:

2 rows above my search term
until
the following '</employee>' field

for all matching instances.

Is this possible?

I can't use the standard xml tools in powershell, as the file I am reading isn't standard xml hence I included [line 157: Time 2015-08-04 11:34:00] as an example. The best way to think of it is lots of xml files, all merged into one xml file, with the [line . . .] headers to break them up.


Additional Info: I fear my example was a little oversimplified, the actual file is more like:

[line 157: Time 2015-08-04 11:34:00]
<?xml version="1.0" encoding="utf-8"?>
<other>
    <stuff>
    . . .
    </stuff>
</other>

<?xml version="1.0" encoding="utf-8"?>
<staff>
    <employee>
    ...
    </employee>
</staff> 

<staff>
    <employee>
    ...
    </employee>
</staff>
[line End: Time 2015-08-04 11:34:00]

Additional Info I added code to ignore the < ?xml version. . . lines. I also tried adding my own root element with:

$first = "<open>"
$last = "</open>"
$a = 0

. . .

if($a -eq 0)
    {
        $XmlFiles[$Index] += $first
        $a++
    } 

. . .

$XmlFiles[$Index] += $last

But this gives an Array assignment failed because index '-1' was out of range. error


Additional Info The final result goes something like this:

$FilePath = "C:\Path\To\XmlDocs.txt"
$XmlFiles = @()
$Index = -1

$first = "<open>"
$last = "</open>"

# Go through the file and store the individual xml documents in a string array
$a=0
Get-Content $FilePath | `
%{
    if($_ -match "^\[line\ \d+")
        {
            if($a -eq 0)
                {
                    #if this is the top line, ignore it
                }
            else
                {
                    #if this is a boundary, add a closing < /open > tag
                    $XmlFiles[$Index] += $last
                }
            # We've got a boundary, move to next index in array
            $Index++
            # Add a new string to hold the next xml document
            $XmlFiles += ""
            # Add an < open > tag
            $XmlFiles[$Index] += $first
            $a++
        } 
    elseif ($_ -match '^\<\?xml') #ignore xml headers
        {
            # End of Section, or XML Header. Do Nothing and move on
        }
    elseif([string]::IsNullOrEmpty($_))
        {
            # Blank Line, Do Nothing and move on
        }
    else 
        {
            # Add each line to the string (xml doesn't care about line breaks)
            $XmlFiles[$Index] += $_
        }
}

# add the final < /open > tag
$XmlFiles[$Index] += $last

$a=0
$Results = foreach($File in $XmlFiles)
{
    $Xml = [xml]($File.Trim())
    # Parse string as an Xml document
    $Xml = [xml]$File
    # Use Xpath to find the manager
    $Xml.SelectNodes("//employee[function = 'management']") |% {$_}
    $a++
}

$Results

It basically ignores the headings [line. . ., the xml definitions < ?xml, and any blank lines, and it adds an < open >. . . < /open > tags around each section to make it valid.

IGGt
  • 2,627
  • 10
  • 42
  • 63
  • Is there only one `[line ...] [line End ...]` pair in each file? If so, just discard the `[]` lines and bump the index whenever you encounter a blank line – Mathias R. Jessen Aug 04 '15 at 15:14
  • unfortunately not, there are many `[line]` pairs added. This is basically a log file, and gets written to every few minutes. – IGGt Aug 04 '15 at 15:18
  • Fix your broken input file format. Seriously. – Ansgar Wiechers Aug 04 '15 at 15:20
  • I wish I could, unfortunately I didn't build it. – IGGt Aug 04 '15 at 15:30
  • @IGGt have you actually tried updating the if statement in my example to do `if(($_ -match "^\[line\ \d+") -or [string]::IsNullOrWhiteSpace($_))`? – Mathias R. Jessen Aug 04 '15 at 15:58
  • I just tried but it results in `Method invocation failed because [System.String] doesn't contain a method named 'IsNullOrWhiteSpace'.` I also tried to ignore the xml declarations, and adding an `< open > . . . < /open >` tags at the start and end (see above). – IGGt Aug 05 '15 at 09:20
  • @IGGt ahh, sorry my bad, `IsNullOrWhitespace()` is added in version 3.5 of .NET. Use `IsNullOrEmpty()` instead – Mathias R. Jessen Aug 05 '15 at 12:44
  • easily done. It didn't help though, as I am getting `Cannot convert value "" to type "System.Xml.XmlDocument". Error: "Root element is missing."` – IGGt Aug 05 '15 at 15:39
  • cheers @Mathias R. Jessen, I think I finally cracked it. – IGGt Aug 06 '15 at 13:20

1 Answers1

1

I think you are overestimating the challenge of parsing the individual Xml documents as actual XML. You could just read through the file, line by line, and use the "[line ...]" string as a boundary between individual documents:

$FilePath = "C:\Path\To\XmlDocs.txt"
$XmlFiles = @()
$Index = -1

# Go through the file and store the individual xml documents in a string array
Get-Content $FilePath |%{
    if($_ -match "^\[line\ \d+"){
        # We've got a boundary, move to next index in array
        $Index++
        # Add a new string to hold the next xml document
        $XmlFiles += ""
    } else {
        # Add each line to the string (xml doesn't care about line breaks)
        $XmlFiles[$Index] += $_
    }
}

$Managers = foreach($File in $XmlFiles){
    # Parse string as an Xml document
    $Xml = [xml]$File
    # Use Xpath to find the manager
    $Xml.SelectNodes("//employee[function = 'management']") |% {$_}
}

With a sample file like this (modified/extended version of your example):

[line 157: Time 2015-08-04 11:34:00] 
<staff>
    <employee>
        <Name>Bob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Sam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Mark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>
[line 158: Time 2015-08-06 12:36:30] 
<staff>
    <employee>
        <Name>Rob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Cam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Stark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>

The resulting $Managers would then be:

PS C:\> $Managers|Select Name,function,age

Name                               function                          age
----                               --------                          ---
Bob Smith                          management                        39
Mark Perkins                       management                        32
Rob Smith                          management                        39
Stark Perkins                      management                        32
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • Thanks for that, I can see the principle of it, but I can't make it work. I fear my example above was a little bit too oversimplified. I am getting `Unexpected XML declaration` errors due to the fact that each section has it's own XML declaration. Removing them from my test data, gives me `There are multiple root elements` errors. – IGGt Aug 04 '15 at 15:02
  • 1
    @IGGt That shouldn't be a problem, but it could be due to the fact that there are some preceding whitespace in front of the XML declaration. Try `$Xml = [xml]($File.Trim())` in the `foreach()` loop – Mathias R. Jessen Aug 04 '15 at 15:08