Is there a way to use Select-String
to find all lines between X
and Y
.
e.g. if I have a file with content:
[line 157: Time 2015-08-04 11:34:00]
<staff>
<employee>
<Name>Bob Smith</Name>
<function>management</function>
<age>39</age>
<birthday>3rd June</birthday>
<car>yes</car>
</employee>
<employee>
<Name>Sam Jones</Name>
<function>security</function>
<age>24</age>
</employee>
<employee>
<Name>Mark Perkins</Name>
<function>management</function>
<age>32</age>
</employee>
</staff>
and I want to find all the content for < function >management< /function >
, so I would end up with:
<employee>
<Name>Bob Smith</Name>
<function>management</function>
<age>39</age>
<birthday>3rd June</birthday>
<car>yes</car>
</employee>
<employee>
<Name>Mark Perkins</Name>
<function>management</function>
<age>32</age>
</employee>
If all groupings were the same size I could use something like:
Select-String -Pattern '<function>management</function>' -CaseSensitive -Context 2,2
However, in reality they are not going to be the same size, so I can't use a fixed number each time.
Really I need a way of saying return everything that is:
2 rows above my search term
until
the following '</employee>' field
for all matching instances.
Is this possible?
I can't use the standard xml tools in powershell, as the file I am reading isn't standard xml hence I included [line 157: Time 2015-08-04 11:34:00]
as an example. The best way to think of it is lots of xml files, all merged into one xml file, with the [line . . .]
headers to break them up.
Additional Info: I fear my example was a little oversimplified, the actual file is more like:
[line 157: Time 2015-08-04 11:34:00]
<?xml version="1.0" encoding="utf-8"?>
<other>
<stuff>
. . .
</stuff>
</other>
<?xml version="1.0" encoding="utf-8"?>
<staff>
<employee>
...
</employee>
</staff>
<staff>
<employee>
...
</employee>
</staff>
[line End: Time 2015-08-04 11:34:00]
Additional Info
I added code to ignore the < ?xml version. . .
lines.
I also tried adding my own root element with:
$first = "<open>"
$last = "</open>"
$a = 0
. . .
if($a -eq 0)
{
$XmlFiles[$Index] += $first
$a++
}
. . .
$XmlFiles[$Index] += $last
But this gives an Array assignment failed because index '-1' was out of range.
error
Additional Info The final result goes something like this:
$FilePath = "C:\Path\To\XmlDocs.txt"
$XmlFiles = @()
$Index = -1
$first = "<open>"
$last = "</open>"
# Go through the file and store the individual xml documents in a string array
$a=0
Get-Content $FilePath | `
%{
if($_ -match "^\[line\ \d+")
{
if($a -eq 0)
{
#if this is the top line, ignore it
}
else
{
#if this is a boundary, add a closing < /open > tag
$XmlFiles[$Index] += $last
}
# We've got a boundary, move to next index in array
$Index++
# Add a new string to hold the next xml document
$XmlFiles += ""
# Add an < open > tag
$XmlFiles[$Index] += $first
$a++
}
elseif ($_ -match '^\<\?xml') #ignore xml headers
{
# End of Section, or XML Header. Do Nothing and move on
}
elseif([string]::IsNullOrEmpty($_))
{
# Blank Line, Do Nothing and move on
}
else
{
# Add each line to the string (xml doesn't care about line breaks)
$XmlFiles[$Index] += $_
}
}
# add the final < /open > tag
$XmlFiles[$Index] += $last
$a=0
$Results = foreach($File in $XmlFiles)
{
$Xml = [xml]($File.Trim())
# Parse string as an Xml document
$Xml = [xml]$File
# Use Xpath to find the manager
$Xml.SelectNodes("//employee[function = 'management']") |% {$_}
$a++
}
$Results
It basically ignores the headings [line. . .
, the xml definitions < ?xml
, and any blank lines, and it adds an < open >. . . < /open >
tags around each section to make it valid.