1

I'm parsing this HTML file

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="utf-8">
</head>

<body>
    <figure>
        <img src="content/test.svg" alt="">
        <figcaption>Test caption.</figcaption>
    </figure>
</body>
</html>

with PowerShell 5. While the below approach works well for all relevant tags, including but not limited to div, p, table, td, tr, ... I seem to not figure out where the "Test caption." text is located in the object.

$html = New-Object -Com "HTMLFile";
$html.IHTMLDocument2_write($htmlContent);
$allTags = $html.all;
$allTags[8].tagName # is FIGURE
$allTags[9].tagName # is /FIGURE

But $allTags[8].outerHTML contains only <FIGCAPTION>. $allTags[9].outerHTML contains only </FIGCAPTION>. innerHTML is empty.

How can $html.documentElement.outerHTML still contain that figcaption text?

Also this w3schools example indicates that it should work like that. What am I missing? Thanks.

braggPeaks
  • 1,158
  • 10
  • 23

1 Answers1

3

It's a compatibility issue. <figcaption> requires IE9+. Even if you have the latest IE version installed, the IE COM object might still choose to parse the HTML in compatiblity mode, which happens here.

Insert the X-UA-Compatible meta tag to force the IE COM object to use the latest IE version:

$htmlContent = @'
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
</head>
<body>
    <figure>
        <img src="content/test.svg" alt="">
        <figcaption>Test caption.</figcaption>
    </figure>
</body>
</html>
'@

$html = New-Object -Com HTMLFile
$html.IHTMLDocument2_write($htmlContent)

$allTags = $html.all
$allTags[8].OuterHtml   # <figcaption>Test caption.</figcaption>
$allTags[8].InnerHtml   # Test caption.

More info: Towards Internet Explorer 11 Compatibility

zett42
  • 25,437
  • 3
  • 35
  • 72
  • Thanks a lot for the fast, precise and resolving answer. While I understand compatibility mode affects the parsing, I still don't understand where the output gets the figcaption text from. Anyways, kind regards from Austria! – braggPeaks Jun 29 '23 at 07:45