1

I am trying to extract information through an HTML text which I retrieved from a web site page by means of powershell. Here is the sample text:

<tr class="mergedrow"> <th scope="row" style="text-align:left;"><a href="/wiki/Provinces_of_Finland" title="Provinces of Finland">Province</a> </th> <td><a href="/wiki/Western_Finland" title="Western Finland" class="mw-redirect">Western Finland</a></td> </tr> <tr class="mergedrow"> <th scope="row" style="text-align:left;"><a href="/wiki/Regions_of_Finland" title="Regions of Finland">Region</a></th> <td><a href="/wiki/Finland_Proper" title="Finland Proper" class="mw-redirect">Finland Proper</a></td> </tr>

Within this text, I can extract the line containing Region information by the regex as below:

PS C:\Users\n12017> $pattern='<th scope="row" style="text-align:left;">.*(Region).*</th>'
PS C:\Users\n12017> $try -imatch $pattern

However, I want to retrieve the lines before and after the matching line. I read about -context method but I failed to applied it. When I try the query below it gives the whole text.

PS C:\Users\n12017> $try | select-string -Context 0,3 $pattern

To sum up, I want to find the lines before and after the related matching line within $try object which contains all the html text.

Thanks in advance...

mlee_jordan
  • 772
  • 4
  • 18
  • 50
  • First, run `$try | select-string $pattern | fl *`. This will tell you what the query found. From the text you pasted, it appears there are no newlines. So getting the full values from select-string will be helpful. – Eris Jan 27 '14 at 18:15
  • Select-string is designed for finding text strings in files. For extracting data from single strings try using the -match operator with a capturing regex, and extract the caputers from $matches. You can also use the -replace operator to trim out all the data excpept for the capture. – mjolinor Jan 27 '14 at 18:22
  • @Eris and mjolinor thanx for your informative replies. Indeed, I want to first find the unique line for 'Region' match and then extract all text until the end of the related table row that is . In essence I am trying to get the text: Province Western Finland – mlee_jordan Jan 28 '14 at 10:49
  • @mjolinor thanx for your informative replies. Indeed, I want to first find the unique line for 'Region' match and then extract all text until the end of the related table row that is . In essence I am trying to get the text: Province Western Finland – mlee_jordan Jan 28 '14 at 10:53

0 Answers0