0

Thanks for reading my question. I have searched for and read similar questions, but none of them quite explained what was going on.

I have an XML file:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="../wikiStyle.css"?>
<!DOCTYPE article SYSTEM "../article.dtd">

<article xmlns:xlink="http://www.w3.org/1999/xlink">
    <header>
        <title>Foreign relations of Malta</title>
        <id>19146</id>
    </header>

    <bdy>
        <link xlink:type="simple" xlink:href="../205/40205.xml">Albania</link>&nbsp;·
        <link xlink:type="simple" xlink:href="../588/67588.xml">Andorra</link>&nbsp;· 
    </bdy>
</article>

As you can see; I've referenced the .dtd file, and it contains definitions like:

<!ENTITY nbsp   "&#160;"> <!-- no-break space = non-breaking space,
                                  U+00A0 ISOnum -->

My aim is to display this .xml file in a browser, readably. The CSS manages this perfectly, except for .xml files, like this one, that contain elements like nbsp;

in that case, I get an error like:

XML Parsing Error: undefined entity

10.1126/science.288.5472.1775</weblink>. PMID 10877698.</cite>&nbsp;</entry>

--------------------------------------------------------------^

As I understood it, this line in the .dtd should declare this element to the browser, and enable me to use &nbsp in my xml (and have it expanded to &#160 by the browser's parser for display.

  • Am I correct in my understand of what should be happening, or am I missing something?
  • How can I declare this element so that it can be displayed by the browser?

Please note: I'm working with millions of these XML files, and I don't generate them. I need a solution that does not involve changing the .xml file itself.

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
Paul
  • 3,318
  • 8
  • 36
  • 60

1 Answers1

1

Your entity declaration looks good, so it might be an issue of the browser not loading external DTD's. (Example: https://developer.mozilla.org/en/XML_in_Mozilla)

I think the only way to handle this is to add the entity declarations to the internal subset of each file:

<!DOCTYPE article [
<!ENTITY nbsp "&#160;"> <!-- no-break space = non-breaking space,
                                  U+00A0 ISOnum -->
]>

I know you said you're working with millions of these files and you don't generate them, but you might be able script the updating of the DOCTYPE declaration and pre-process them.

Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • Thanks for your help. Your diagnosis of the problem was correct, the .dtd just wasn't being read. As for the solution, it turns out an internal .dtd adds 30+kb to each file, which is unworkable. For the time-being the best solution I've found is to lazily parse my documents manually, and replace each instance of each element with it's hashcode equivalent. – Paul Jul 24 '12 at 13:35
  • If your primary concern for the browser is that it display the document correctly (i.e. you're validating separately), then you don't need the full DTD in the internal subset, just the entity declarations for the entities referred to in the document. If you're willing to accept some hand work, you can insert the entity declaration to the internal subset as easily as replacing the entity reference with a numeric character reference. (And if the hand work becomes too much, a simple identity transform with XSLT is an easy way to perform such an operation automatically.) – C. M. Sperberg-McQueen Aug 21 '12 at 19:17
  • @C.M.Sperberg-McQueen - Good comment and I agree completely. I wonder if the 30+kb that Paul mentions in the comment is all entity declarations or if he added the entire DTD. I think my XML example is showing exactly what you're talking about; an internal subset with only the entity declarations. I also removed the public and system identifiers. – Daniel Haley Aug 21 '12 at 19:33