1

Trying to fetch the table of content using h3 tag.

link- https://learn.microsoft.com/en-us/troubleshoot/sql/releases/download-and-install-latest-updates

Trying to fetch the whole table of 'SQL Server 2019'. and convrting into json object in powershell.

| Build number or version | Service pack | Update | ..
|        15.0.4298.1      |     None     |  CU19  | ..
|        15.0.4280.7      |     None     |  CU18 + GDR | ..
...

...........................Tried this

` Send a request to the URL and get the HTML content using basic parsing

$response = Invoke-WebRequest -Uri "https://learn.microsoft.com/en-us/troubleshoot/sql/releases/download-and-install-latest-updates" -UseBasicParsing
$response = $response.content

"Find the table which has an h3 tag containing "sql-server-2017""

$table = ($response.ParsedHtml).getElementsByTagName("h3") | Where-Object {$_.innerText -eq "SQL Server 2017"} | Select-Object -ExpandProperty parentNode | Select-Object -ExpandProperty nextSibling

Print the table content

Write-Output $table.outerHTML

but not working

scode123
  • 49
  • 4

1 Answers1

0

The -UseBasicParsing parameter means you won't have any ParsedHtml data. This is the default on powershell 6+. You can confirm by just inspecting the property

$response = Invoke-WebRequest -Uri "https://learn.microsoft.com/en-us/troubleshoot/sql/releases/download-and-install-latest-updates" -UseBasicParsing
$response

enter image description here

You can use a HTML parser with the html data you retrieve. There are even modules made just for this.

You can also use regular expressions for a quick and dirty solution. This is not recommended as it's cumbersome and fragile as the slightest change in the website can completely break your solution. Here is an example of a quick, and very dirty regex extraction.

$tablepattern = '(?s)(?<=/table>\r?\n)<h3.+?sql-server-2019.+?(?=<h3)'
$datapattern = '(?s)<tr.+?td>(?<Version>.+?)<.+?td>(?<ServicePack>[^<]+?)</.+?>(?<Update>[^<]+?)</.+?href="(?<KBArticle>.+?)".+?>(?<KBNumber>[^<]+?)<.+?<td>(?<ReleaseDate>.+?)</'

$response = Invoke-WebRequest -Uri "https://learn.microsoft.com/en-us/troubleshoot/sql/releases/download-and-install-latest-updates" -UseBasicParsing
$null = $response.RawContent -match $tablepattern

[string]$table = $matches.Values

$data = $table -split '(?=<tr>)' | ForEach-Object{
    if($_ -match $datapattern){
        $matches.Remove(0)
        [PSCustomObject]$matches
    }
}

$data | Format-Table

KBNumber Version      ReleaseDate        ServicePack KBArticle                                  Update    
-------- -------      -----------        ----------- ---------                                  ------    
5023049  15.0.4298.1  February 16, 2023  None        https://support.microsoft.com/help/5023049 CU19      
5021124  15.0.4280.7  February 14, 2023  None        https://support.microsoft.com/help/5021124 CU18 + GDR
5021125  15.0.2101.7  February 14, 2023  None        https://support.microsoft.com/help/5021125 GDR       
5017593  15.0.4261.1  September 28, 2022 None        https://support.microsoft.com/help/5017593 CU18      
5016394  15.0.4249.2  August 11, 2022    None        https://support.microsoft.com/help/5016394 CU17      
5014353  15.0.4236.7  June 14, 2022      None        https://support.microsoft.com/help/5014353 CU16 + GDR
...

All that's left is to convert it to json

$data | ConvertTo-Json
Doug Maurer
  • 8,090
  • 3
  • 12
  • 13