5

How can i set the doctype of an DOMDocument60 object?

For example i try:

IXMLDOMDocument60 doc = new DOMDocument60();
doc.doctype.name = "html";

except that IXMLDOMDocumentType.name is read-only:

IXMLDOMDocumentType = interface(IXMLDOMNode)
{
   ['{2933BF8B-7B36-11D2-B20E-00C04F983E60}']
   string Get_name();
   ...
   property String name read Get_name;
}

and IXMLDOMDocument60.doctype is read-only:

IXMLDOMDocument = interface(IXMLDOMNode)
{
   ['{2933BF81-7B36-11D2-B20E-00C04F983E60}']
   IXMLDOMDocumentType Get_doctype();
   ...
   property IXMLDOMDocumentType doctype read Get_doctype;
}

So how can i set the doctype of an XML document?


Bonus Question: How can i create an DOMDocument60 object with a specified doctype?


Note: You see no mention of XSLT, because there is none. i'm building an HTML DOM tree in MSXML.

Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219

1 Answers1

3

For perfomance reasons and security reasons, Microsoft does not usually allow <!DOCTYPE> (aka the Document Type Definition). Because of this, you must use the loadXML method to set the <!DOCTYPE>. So, it cannot be set after a document has been created or imported.

On top of that, because of default security settings in MSXML6, you normally cannot import XML that has a <!DOCTYPE>. So, you must disable the ProhibitDTD setting on the object.

Edit: You should know that HTML5 is not XML. Also, the <!DOCTYPE> is considered optional for XHTML5.

First, let's start with the desired output.

<!DOCTYPE html>
<html />

Based on the syntax, I'm assuming you are using C# and have added a reference to msxml6.dll. The following code will allow you to create those two processing instructions.

MSXML2.DOMDocument60 doc  = new MSXML2.DOMDocument60();

// Disable validation when importing the XML
doc.validateOnParse = false;
// Enable the ability to import XML that contains <!DOCTYPE>
doc.setProperty("ProhibitDTD", false);
// Perform the import
doc.loadXML("<!DOCTYPE html><html />");
// Display the imported XML
Console.WriteLine(doc.xml);

Here's a copy of the code written in VBScript, as well.

Set doc = CreateObject("MSXML2.DOMDocument.6.0")

' Disable validation when importing the XML
doc.validateOnParse = False
' Enable the ability to import XML that contains <!DOCTYPE>
doc.setProperty "ProhibitDTD", false
' Perform the import
doc.loadXML "<!DOCTYPE html><html />"
' Display the imported XML
WScript.Echo objXML.xml

Finally, here's a copy of the code written in C++.

#include <comutil.h>
#pragma comment(lib, "comsuppw.lib")
#include <msxml6.h>
#pragma comment(lib, "msxml6.lib")

int main(int argc, char* argv[])
{

    HRESULT hr = S_OK;
    VARIANT_BOOL success = VARIANT_TRUE;
    // IXMLDOMDocument2 is needed for setProperty
    IXMLDOMDocument2 *doc;

    // Initialize COM
    hr = CoInitialize(NULL);
    if (SUCCEEDED(hr))
    {
        // Create the object
        hr = CoCreateInstance(CLSID_DOMDocument60, NULL, CLSCTX_INPROC_SERVER, IID_IXMLDOMDocument2, (void**)&doc);
        if (SUCCEEDED(hr))
        {
            // Disable validation when importing the XML
            hr = doc->put_validateOnParse(VARIANT_FALSE);
            // Enable the ability to import XML that contains <!DOCTYPE>
            hr = doc->setProperty(_bstr_t(L"ProhibitDTD"), _variant_t(VARIANT_FALSE));
            // Perform the import
            hr = doc->loadXML(_bstr_t(L"<!DOCTYPE html><html />"), &success);
            // Retrieve the XML
            _bstr_t output{};
            hr = doc->get_xml(output.GetAddress());
            // Display the imported XML
            MessageBoxW(NULL, output, NULL, 0);
        }
        // Cleanup COM
        CoUninitialize();
    }
    return 0;
}
jveazey
  • 5,398
  • 1
  • 29
  • 44
  • My issue is now that i have a `DOMDocument` in memory, i need to save it to a stream using some particular encoding. `IXMLDOMDocument.Save` writes the document to a stream using the encoding specified in the `doctype`. Which means i need to be able to change the doctype to set the encoding. – Ian Boyd Mar 15 '21 at 14:33
  • As far as I am aware, encoding isn't declared in the doctype. https://www.w3.org/International/questions/qa-html-encoding-declarations – jveazey Mar 16 '21 at 19:11
  • It absolutely is declared in the doctype. https://www.w3.org/TR/xml/#sec-prolog-dtd. Which is why msxml uses the encoding in the doctype. https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms753769(v=vs.85) – Ian Boyd Mar 16 '21 at 19:53
  • Pretty sure we're talking about two different things. My answer refers to the "Document Type Definition" ` ` not the "XML Declaration" ``. – jveazey Mar 17 '21 at 20:38
  • I was referring to the *"Document Type Declaration"* (or *doctype*, if you will). Of course: HTML is not XML; and i was referring to XML - not HTML. – Ian Boyd Mar 17 '21 at 22:07
  • Then, I'm definitely confused. Reading over https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms753769(v=vs.85)#remarks it states `Character encoding is based on the encoding attribute in the XML declaration, such as . When no encoding attribute is specified, the default setting is UTF-8.` So, I don't see where the _doctype_ comes into play for encoding. Even the XML specification you linked https://www.w3.org/TR/xml/#charencoding references the _XML declaration_, not the _doctype_ for character encoding. – jveazey Mar 19 '21 at 03:43
  • Exactly, the doctype: ``. Problem is how to set that, the doctype, **after** i have a document in memory. – Ian Boyd Mar 19 '21 at 05:08
  • Just for clarification, that's the _XML declaration_, but to answer your question, that's a whole other can of worms. Can I get you post a new question to stackoverflow and link it here? My answer was specifically for ` `. It's more complicated than a comment can answer. – jveazey Mar 19 '21 at 16:32
  • It's an XML *Document Type Declaration*, which we will shorten to *"doctype"*; since it doesn't affect anything in any way, and nobody is confused by it at all, and it's the term for it. – Ian Boyd Mar 19 '21 at 17:21
  • Regardless, my answer was for ` `, not ``. If you feel my answer is incorrect, feel free to unmark it as the answer. Dealing with `` is far different than ` `. I will try to add an answer for `` when I can. – jveazey Mar 19 '21 at 17:49