0

We are migrating an asp.net intranet to SharePoint and automating the conversion via PowerShell.

We only want to scrap links from within the DIV tag with a classname 'topnav'. Not all the links on the page

$url = "http://intranet.company.com"
$page = Invoke-WebRequest -Uri $url
$div_topnav = $page.ParsedHtml.getElementsByTagName('div') | ? {$_.className -match 'topnav'}

This gets us the HTML of the topnav, but how best to extract just the application links from the Applications nodes? We do not want HOME or Documents nodes?

<div class="topnav" >
<ul class="lev1 clearfix" >
    <li class="lev1 pos1 first lev1_first">
        <a href="index.html">Home</a>
    </li>
    <li class="lev1 pos2 haschildren lev1_haschildren">
        <a href="index.html">Applications</a>
        <ul>
            <li class="lev2 pos1 first lev2_first">
                <a href="http://someurl.com">App 1</a>
            </li>
            <li class="lev2 pos2 haschildren lev2_haschildren">
                <a href="index.html">Training</a>
                <ul class="lev3">
                    <li class="lev3 pos1 lev3_pos1 first lev3_first">
                        <a href="http://someurl.com">App 3</a>
                    </li>
                    <li class="lev3 pos2 lev3_pos2 last lev3_last">
                        <a href="http://someurl.com">App 4</a>
                    </li>
                </ul>
            </li>
        </ul>
    <li class="lev1 pos3 haschildren lev1_haschildren">
        <a href="index.html">Documents</a>
        <ul>
            <li class="lev2 pos1 first lev2_first">
                <a href="http://someurl.com">Doc 1</a>
            </li>
            <li class="lev2 pos2 haschildren lev2_haschildren">
                <a href="index.html">Training</a>
                <ul class="lev3">
                    <li class="lev3 pos1 lev3_pos1 first lev3_first">
                        <a href="http://someurl.com">Doc 3</a>
                    </li>
                    <li class="lev3 pos2 lev3_pos2 last lev3_last">
                        <a href="http://someurl.com">Doc 4</a>
                    </li>
                </ul>
            </li>
        </ul>
    </li>
</ul>
</div>
user2019423
  • 69
  • 1
  • 2
  • 6

1 Answers1

0

I think that is what you want:

[xml]$div_topnav=
@"<div class="topnav" >
    <ul class="lev1 clearfix" >
    <li class="lev1 pos1 first lev1_first">
        <a href="index.html">Home</a>
    </li>
    <li class="lev1 pos2 haschildren lev1_haschildren">
        <a href="index.html">Applications</a>
        <ul>
            <li class="lev2 pos1 first lev2_first">
                <a href="http://someurl.com">App 1</a>
            </li>
            <li class="lev2 pos2 haschildren lev2_haschildren">
                <a href="index.html">Training</a>
                <ul class="lev3">
                    <li class="lev3 pos1 lev3_pos1 first lev3_first">
                        <a href="http://someurl.com">App 3</a>
                    </li>
                    <li class="lev3 pos2 lev3_pos2 last lev3_last">
                        <a href="http://someurl.com">App 4</a>
                    </li>
                </ul>
            </li>
        </ul>
    </li>
        <li class="lev1 pos3 haschildren lev1_haschildren">
            <a href="index.html">Documents</a>
            <ul>
                <li class="lev2 pos1 first lev2_first">
                    <a href="http://someurl.com">Doc 1</a>
                </li>
                <li class="lev2 pos2 haschildren lev2_haschildren">
                    <a href="index.html">Training</a>
                    <ul class="lev3">
                        <li class="lev3 pos1 lev3_pos1 first lev3_first">
                            <a href="http://someurl.com">Doc 3</a>
                        </li>
                        <li class="lev3 pos2 lev3_pos2 last lev3_last">
                            <a href="http://someurl.com">Doc 4</a>
                        </li>
                    </ul>
                </li>
            </ul>
        </li>
    </ul>
</div>
"@
($div_topnav.GetElementsByTagName("a") | ? "#Text" -Like "App *").href

The output will be your links of all of your apps.

PowerShell couldn't parse your posted $div_topnav Content, because there is a closing li-tag missing for your li-tag in line 6 (I fixed that in my Code snippet).

PatM0
  • 96
  • 5