1

I am writing a PowerShell script to work in Windows 10. I am using the 'HTML Agility Pack' library version 1.11.43.

In this library, there is a GetAttributeValue method for HTML element nodes in four versions:

  1. public string GetAttributeValue(string name, string def)
  2. public int GetAttributeValue(string name, int def)
  3. public bool GetAttributeValue(string name, bool def)
  4. public T GetAttributeValue<T>(string name, T def)

I have written a test script for these methods on PowerShell:

$libPath = "HtmlAgilityPack.1.11.43\lib\netstandard2.0\HtmlAgilityPack.dll"
Add-Type -Path $libPath
$dom = New-Object -TypeName "HtmlAgilityPack.HtmlDocument"
$dom.Load("test.html", [System.Text.Encoding]::UTF8)

foreach ($node in $dom.DocumentNode.DescendantNodes()) {
    if ("#text" -ne $node.Name) {
        $node.OuterHTML
        "    " + $node.GetAttributeValue("class", "")
        "    " + $node.GetAttributeValue("class", 0)
        "    " + $node.GetAttributeValue("class", $true)
        "    " + $node.GetAttributeValue("class", $false)
        "    " + $node.GetAttributeValue("class", $null)
    }
}

File 'test.html':

<p class="true"></p>
<p class="false"></p>
<p></p>
<p class="any other text"></p>

Test script execution result:

<p class="true"></p>
    true
    0
    True
    True
    True
<p class="false"></p>
    false
    0
    False
    False
    False
<p></p>

    0
    True
    False
    False
<p class="any other text"></p>
    any other text
    0
    True
    False
    False

I know that to get the attribute value of an HTML element, you can also use the expression $node.Attributes["class"]. I also understand what polymorphism and method overloading are. I also know what a generic method is. I don't need to explain that.

I have three questions:

  1. When called $node.GetAttributeValue("class", $null) from a PowerShell script, which of the four variants of the GetAttributeValue method works?

  2. I think the fourth option works (generic method). Then why does a call with the second parameter $null work exactly the same as a call with the second parameter $false?

  3. In the C# source code, the fourth option requires the following condition to work

#if !(METRO || NETSTANDARD1_3 || NETSTANDARD1_6)

I tried the library versions for NETSTANDARD1_6 and for NETSTANDARD2_0. The test script works the same way. But with NETSTANDARD1_6 the fourth option should be unavailable, right? Then when NETSTANDARD1_6 then which version of the method GetAttributeValue works with the second parameter $null?

Ilya Chalov
  • 149
  • 2
  • 9

1 Answers1

2

tl;dr

To achieve what you unsuccessfully attempted with
$node.GetAttributeValue("class", $null), i.e., to return the attribute value as a [string] and default to $null if there is none, use:

$node.GetAttributeValue("class", [string] [NullString]::Value)

[string] $null works too, but makes "" (the empty string) rather than $null the default value.


While the overload resolution that you're seeing is surprising, you can resolve ambiguity during PowerShell's method overload resolution with casts:

$dom = [HtmlAgilityPack.HtmlDocument]::new()
$dom.LoadHtml(@'
<p class="true"></p>
<p class=42></p>
<p></p>
<p class="any other text"></p>
'@)

$nodes = $dom.DocumentNode.SelectNodes('p')

# Note the use of explicit casts (e.g., [string]) to guide overload resolution.
$nodes[0].GetAttributeValue('class', [bool] $false)
$nodes[1].GetAttributeValue('class', [int] 0)
$nodes[2].GetAttributeValue('class', [string] 'default')
$nodes[3].GetAttributeValue('class', [string] [NullString]::Value)

Output:

True
42
default
any other text

Alternatively, in PowerShell (Core) 7.3+[1], you can now call generic methods with explicit type arguments:

# PS 7.3+
# Note the generic type argument directly after the method  name.
# Calls the one and only generic overload, with various types substituted for T:
#   public T GetAttributeValue<T>(string name, T def)
# Note how the 2nd argument doesn't need a cast anymore.
$nodes[0].GetAttributeValue[bool]('class',  $false)
$nodes[1].GetAttributeValue[int]('class', 0)
$nodes[2].GetAttributeValue[string]('class', 'default')
$nodes[3].GetAttributeValue[string]('class', [NullString]::Value)

Note:

  • When you pass $null to a [string] typed parameter (both in cmdlets and .NET methods), PowerShell actually converts it quietly to "" (the empty string). [NullString]::Value tell's PowerShell to pass a true null instead, and is mostly needed for calling .NET methods where a behavioral distinction can result from passing null vs. "".

  • Therefore, if you were to call $nodes[3].GetAttributeValue('class', [string] $null) or, in PS 7.3+, $nodes[3].GetAttributeValue[string]('class', $null), you'd get "" (empty string) as the default value if attribute class doesn't exist.

  • By contrast, [NullString]::Value, as used in the commands above, causes a true $null value to be returned if the attribute doesn't exist; you can test for that with $null -eq ....


As for your questions:

On a general note, PowerShell's overload resolution is complex, and for the ultimate source of truth you'll have to consult the source code. The following is based on the de-facto behavior as of PowerShell 7.2.6 and musings about logic that could be applied.

When calling $node.GetAttributeValue("class", $null) from a PowerShell script, which of the four variants of the GetAttributeValue method works?

In practice, the public bool GetAttributeValue(string name, bool def) overload is chosen; why it, specifically, is chosen among the available overloads is ultimately immaterial, because the fundamental problem is that to PowerShell, $null provides insufficient information as to the type it may be a stand-in for, so it cannot generally be expected to select a specific overload (for the latter, you need a cast, as shown at the top):

  • In C# passing null to the second parameter in a non-generic call unambiguously implies the overload with the string-typed def parameter, because among the non-generic overloads, string as the type of the def parameter is the only .NET reference type, and therefore the only type that can directly accept a null argument.

  • This is not true in PowerShell, which has much more flexible, implicit type-conversion rules: from PowerShell's perspective, $null can bind to any of the types among the def parameters, because it allows $null to be converted to those types; specifically, [bool] $null yields $false, [int] $null yields 0, and - perhaps surprisingly, as discussed above - [string] $null yields "" (the empty string).

    • Thus, PowerShell is justified in selecting any one of the non-generic overloads in this case, and which one it chooses should be considered an implementation detail.

However, curiously, even using [NullString]::Value doesn't make a difference, even though PowerShell should know that this special value represents a $null value for a string parameter - see GitHub issue #18072


I think the fourth option works (generic method). Then why does a call with the second parameter $null work exactly the same as a call with the second parameter $false?

With the generic invocation syntax available in v7.3+, the generic overload definitely works - and a $null as the default-value argument is converted to the type specified as the type argument (assuming PowerShell allows such a conversion; it wouldn't work with [datetime], for instance, because [datetime] $null causes an error).

Even with the non-generic syntax, PowerShell does select the generic overload by inference, as the following example shows, but only when you pass an actual object rather than $null:

# Try to retrieve a non-existent attribute and provide a [double]
# default value.
# The fact that a [double] instance is returned implies that the
# generic overload was chosen.
#  -> 'System.Double'
$nodes[0].GetAttributeValue('nosuch', [double] $null).GetType().FullName

In the C# source code, the fourth option requires the following condition to work [...]

When you pass $null, the generic overload is not considered - and cannot be, in the absence of type information - so this doesn't make a difference.


[1] As of this writing, v7.3 hasn't been released yet, but preview versions are available - see the repo.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • I marked your answer as an answer to my question because it is very useful and really answers my questions. But what is happening is not completely clear to me. For `$null`, the generic method doesn't work at all? I realized that a variant of the method with the `bool` type is being selected. But you don't know the reason for this? – Ilya Chalov Sep 12 '22 at 07:47
  • 1
    I'm glad to hear the answer is useful, @IlyaChalov. Please see my update, which explains why `$null` doesn't work in more detail. In short: in PowerShell - unlike in C# - it is _ambiguous_, due to the implicit type conversions PowerShell generally performs. Using an explicit cast (`[string] $null`) is the solution, though, as noted, for separate reasons that actually makes `""`, not `$null`, the default value. Thus, `[string] [NullString]::Value` is the equivalent of passing `null` in C#. – mklement0 Sep 12 '22 at 13:22