2

Main problem stemmed from the fact that HtmlAgiltyPack won't get child nodes from a <form> element by default. See How to get all input elements in a form with HtmlAgilityPack without getting a null reference error for more information.

The problem is, that link shows how to fix the issue in C#, but I need to fix it in PowerShell. Any ideas?


I'll simplify my HTML

<form method="POST" action="post.aspx" id="form">
    <div>
        <input type="hidden" name="test1" id="test1" value="1" />
    </div>
    <input type="text" name="test2" id="test2" value="12345" />
</form>

Now I see that when I select the <form> element, I don't get any children back, hence why I couldn't select the <input> elements.

Add-Type -Path "C:\Program Files (x86)\HtmlAgilityPack\HtmlAgilityPack.dll"
$HTMLDocument = New-Object HtmlAgilityPack.HtmlDocument
$HTMLDocument.Load("C:\users\smithj\Desktop\test2.html")
$inputNodes=$HTMLDocument.DocumentNode.SelectNodes("//form")
$inputNodes

# Output shortened to show important bits ...
ChildNodes           : {}
HasChildNodes        : False

You can see that HasChildNodes is equal to false.

From the C# link I provided, I somehow need to run HtmlNode.ElementsFlags.Remove("form"); but I can't figure out what to type into PowerShell that would be equivalent.

Thanks again!


EDIT

Thanks to har07 for pointing me in the right direction. [HtmlAgilityPack.HtmlNode]::ElementsFlags.Remove("form") was what I needed to run.

Note that I need to run that command before I load in the HTML.

> Add-Type -Path ".\Net40\HtmlAgilityPack.dll"
> [HtmlAgilityPack.HtmlNode]::ElementsFlags.Remove("form")
True
>
> $HTMLDocument = New-Object HtmlAgilityPack.HtmlDocument
> $HTMLDocument.Load(".\file.html")
> $HTMLDocument.DocumentNode.SelectNodes("//form")

# Output shortened to show important bits ...
ChildNodes           : {#text, div, #text, input...}
HasChildNodes        : True
OuterHtml            : <form method="POST" action="post.aspx" id="form">
                           <div>
                               <input type="hidden" name="test1" id="test1" value="1">
                           </div>
                           <input type="text" name="test2" id="test2" value="12345">
                       </form>
Community
  • 1
  • 1
romellem
  • 5,792
  • 1
  • 32
  • 64

1 Answers1

1

Actually I'm not a user of PowerShell, but according to this blog post, you may want to try something like this :

[HtmlAgilityPack.HtmlNode.ElementsFlags]::Remove("form")
romellem
  • 5,792
  • 1
  • 32
  • 64
har07
  • 88,338
  • 12
  • 84
  • 137
  • Thanks, this is very helpful. Looks like `[HtmlAgilityPack.HtmlNode]` doesn't have an `ElementsFlags` member though, so calling that doesn't work. – romellem Jul 25 '14 at 16:24
  • 1
    Ah, I wasn't looking at the static members of `[HtmlAgilityPack.HtmlNode]`. Once I *did* look at the static members, I was able to see how to properly format the command. This was the eventual winner. Thanks again! `[HtmlAgilityPack.HtmlNode]::ElementsFlags.Remove("form")` – romellem Jul 25 '14 at 16:54