0



I am importing XML into InDesign, and I get this message:

The external entity 'blahblah.dtd' cannot be found. Continue to import anyway?

And when I then continue to import the XML, I get this error message:

Javascript Error!

Error Number: 103237 Error String: DOM transformation error: Invalid namespace.

Engine: session File: C:\blahblah\blahblah.jsx Line: 259 Source:
obj.doc.importXML(File(xmlDoc) );

...the problem is, is that I won't have access to the DTD, and I won't need it for my purposes anyway.


  • So, is there a Extendscript way to ignore the DTD?
  • If not, is there a way to ignore the DTD with XSLT?



Here is the relevant code:
function importXML(xmlDoc, xslt)
{
    with(obj.doc.xmlImportPreferences)
    {
        importStyle = XMLImportStyles.MERGE_IMPORT; // merges XML elements into the InDesign document, merging with whatever matching content
        createLinkToXML = true; // link elements to the XML source, instead of embedding the XML

        // defining the XSL transformation settings here
        allowTransform = true; // allows XSL transformation
        transformFilename = File(xslt); // applying the XSL here

        repeatTextElements = true; //  repeating text elements inherit the formatting applied to placeholder text, **only when import style is merge!
        ignoreWhitespace = true; // gets rid of whitespace-only  text-nodes, and NOT whitespace in Strings
        ignoreComments = true;
        ignoreUnmatchedIncoming = true; // ignores elements that do not match the existing structure, **only when import style is merge!
        importCALSTables = true; // imports CALS tables as InDesign tables
        importTextIntoTables = true; // imports text into tables if tags match placeholder tables and their cells, **only when import style is merge!
        importToSelected = false; // import the XML at the root element
        removeUnmatchedExisting = false;
    }

    obj.doc.importXML(File(xmlDoc) );
    obj.doc.mapXMLTagsToStyles(); // automatically match all tags to styles by name (after XSL transformation)

    alert("The XML file " + xmlDoc.name + " has been successfully imported!");

} // end of function importXML

...this is based on p. 407 (Chapter 18) of InDesign CS5 Automation Using XML & Javascript, by Grant Gamble

Ian Campbell
  • 2,678
  • 10
  • 56
  • 104
  • Have you tried modifing the xml with an xslt to remove the reference to the dtd? – zanegray Jul 31 '12 at 19:10
  • Thanks @zanegray, that does seem to be the best way... I am trying `` with ``, but it is showing this error: `Token '!' not recognized.` – Ian Campbell Jul 31 '12 at 19:15
  • ...I have also just tried to implement the solution found at http://www.stylusstudio.com/xsllist/200104/post90620.html, but was not working either. – Ian Campbell Jul 31 '12 at 19:25
  • You can't match the doctype with a template. The XML output isn't going to have a doctype unless you specify it. (It will be stripped by default.) Use an identity transform in your XSLT (or `xsl:copy-of` like in the link above) and you should be good. – Daniel Haley Jul 31 '12 at 19:52

3 Answers3

1

I think zanegray gave you the main concept although I think you overcomplicate stuff. Why not just getting xml file content, remove teh dtd declaration with a regexp and then output a new XML File that will be used for input ?

//Open and retrieve original xml file content
var originalXMLFile = File (Folder.desktop+"/foo.xml" );
originalXMLFile.open('r');
var content = originalXMLFile.read();
//Looks for a DOCTYPE declaration and remove it
content = content.replace ( /\n<!DOCTYPE[^\]]+\]>/g , "" );
originalXMLFile.close();
//Creates a new file without any DTD declaration
var outputFile = new File ( Folder.desktop+"/bar.xml" );
outputFile.open('w');
outputFile.write(content);
outputFile.close();

You can then use this filtered xml for your import.

Loic
  • 2,173
  • 10
  • 13
  • That regex will only remove a doctype with an internal subset (`[]`) that ends on the same line. What about a doctype with no internal subset? What about a doctype with content in an internal subset that spans multiple lines? (Or contains something like `<!ENTITY foo "[bar]">`? I don't think regex is a good idea for stripping doctypes. (I have done something similar in the past though by deleting everything up to the root element (identified in the doctype declaration).) – Daniel Haley Jul 31 '12 at 19:50
  • Thanks @Loic! Hmm, would this maintain the link to the original XML document though? There is a need for a link to the original, so that any changes made to the XML would be automatically updated in the InDesign document... – Ian Campbell Jul 31 '12 at 20:38
  • I think DevNull solution is better, forget about regexp ;) – Loic Jul 31 '12 at 20:39
1

Here's an XSLT that will strip the DOCTYPE declaration:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <xsl:copy-of select="."/>
    </xsl:template>
</xsl:stylesheet>
Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • Thanks @DevNull, however this is not working... I am testing this at http://www.w3schools.com/XSL/tryxslt.asp?xmlfile=cdcatalog&xsltfile=cdcatalog_ex2, using this basic XML: ` `. – Ian Campbell Jul 31 '12 at 21:17
  • 1
    @IanCampbell - I think it appears it's not working because the w3schools tool is trying to display HTML output. Try a different processor. Another online tool to try is XML Playground. Try this saved session: http://www.xmlplayground.com/84o19w (Don't forget to click on the "View Source" tab to see the actual output.) – Daniel Haley Aug 01 '12 at 02:59
  • Ah, @DevNull you are correct -- it *is* working at http://xslt.online-toolz.com/tools/xslt-transformation.php, and at the link you provided as well. However, it is *not* working in InDesign unfortunately.. – Ian Campbell Aug 01 '12 at 03:28
1

Ok, even simplier. We just have to prevent interaction and then remove any dtds attached:

function silentXMLImport(file)
{
    var doc, oldInteractionPrefs = app.scriptPreferences.userInteractionLevel;

    if ( !(file instanceof File) || !file.exists )
    {
        alert("Problem with file : "+file );
    }

    if ( app.documents.length == 0 )
    { 
        alert("Open a document first");
        return; 
    }

    //Prevent interaction and warnings
    app.scriptPreferences.userInteractionLevel = UserInteractionLevels.NEVER_INTERACT;
    doc = app.activeDocument;
    doc.importXML ( file );

    //Remove any dtd attached to the document
    doc.dtds.everyItem().remove();

    app.scriptPreferences.userInteractionLevel = oldInteractionPrefs;
}

//Now import xml
silentXMLImport ( File ( Folder.desktop+"/foobar.xml" ) );

It's working here.

Loic
  • 2,173
  • 10
  • 13
  • Thanks @Loic, problem solved! I am curious though -- is it similarly possible to remove all namespaces with something like `doc.namespaceDeclarations().everyItem().remove()`, or `doc.removeNamespace(doc.namespaceDeclarations().everyItem() )`? – Ian Campbell Aug 01 '12 at 18:20
  • Not sure it can be fixed that way. Can't see any accessible property regarding namespaces. – Loic Aug 01 '12 at 18:50