We are migrating an asp.net intranet to SharePoint and automating the conversion via PowerShell.
We only want to scrap links from within the DIV tag with a classname 'topnav'. Not all the links on the page
$url = "http://intranet.company.com"
$page = Invoke-WebRequest -Uri $url
$div_topnav = $page.ParsedHtml.getElementsByTagName('div') | ? {$_.className -match 'topnav'}
This gets us the HTML of the topnav, but how best to extract just the application links from the Applications nodes? We do not want HOME or Documents nodes?
<div class="topnav" >
<ul class="lev1 clearfix" >
<li class="lev1 pos1 first lev1_first">
<a href="index.html">Home</a>
</li>
<li class="lev1 pos2 haschildren lev1_haschildren">
<a href="index.html">Applications</a>
<ul>
<li class="lev2 pos1 first lev2_first">
<a href="http://someurl.com">App 1</a>
</li>
<li class="lev2 pos2 haschildren lev2_haschildren">
<a href="index.html">Training</a>
<ul class="lev3">
<li class="lev3 pos1 lev3_pos1 first lev3_first">
<a href="http://someurl.com">App 3</a>
</li>
<li class="lev3 pos2 lev3_pos2 last lev3_last">
<a href="http://someurl.com">App 4</a>
</li>
</ul>
</li>
</ul>
<li class="lev1 pos3 haschildren lev1_haschildren">
<a href="index.html">Documents</a>
<ul>
<li class="lev2 pos1 first lev2_first">
<a href="http://someurl.com">Doc 1</a>
</li>
<li class="lev2 pos2 haschildren lev2_haschildren">
<a href="index.html">Training</a>
<ul class="lev3">
<li class="lev3 pos1 lev3_pos1 first lev3_first">
<a href="http://someurl.com">Doc 3</a>
</li>
<li class="lev3 pos2 lev3_pos2 last lev3_last">
<a href="http://someurl.com">Doc 4</a>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>