0

I have a large XML file, in which every nodes requires a CDATA tag.

<root>
    <a>
        <id>my_id</id>
        <tr><![CDATA[This is the data]]></tr>
    </a>
    <b>
        ...
    </b>
</root>

How to avoid to place CDATA in every node? Does DTD or Schema provide a method for this?

The reason for this requirement comes from a in-house framework, for localization purposes. All tags which contain the messages are to be CDATA'd, because very often they contain special characters. The XML I wrote was just for demonstration purposes and does not represent the actual data that I handle.

skaffman
  • 398,947
  • 96
  • 818
  • 769
Jorjon
  • 5,316
  • 1
  • 41
  • 58

3 Answers3

1

CDATA relates to the content of a node, while the schema information is about the structure of the document. They aren't especially related.

Looking at your document, there's no need for the CDATA element to be there. It's only for easing the parsing/writing of the content when there are angle brackets and other special characters in the content.

The actual CDATA syntax is required to indicate a CDATA section, because its intention it to support characters which would otherwise be interpreted as XML. The full syntax is there to remove the ambiguity of what is content and what is tag.

bdukes
  • 152,002
  • 23
  • 148
  • 175
1

How to avoid to place CDATA in every node? Does DTD or Schema provide a method for this?

No... DTD or Schema is no help to your problem.

The reason for this requirement comes from a in-house framework

Well... Of course the XML parser which parses the document knows whether the section was an CDATA-section or not. This is also represented in the DOM by distinguishing between the interface CDATASection and the interface Text. So it is very easy for someone who writes an XML parser to enforce the use of CDATA-sections instead of just plain text sections. In 99.9% of the cases this is plain stupid and you should not check for that. But on the other hand I have seen so many stupid things in my life, that I would not at all be surprised if your in-house framework does just that and enforce the existence of CDATA-sections.

If this is the case (just try it), then you have to write the CDATA sections and be happy with that. If you are not happy with that after all you can write a script that transforms your XML adding these CDATA-sections.

yankee
  • 38,872
  • 15
  • 103
  • 162
  • Thanks for addressing the question directly, without asking why, and providing extra information after the answer. – Jorjon May 21 '12 at 20:47
1

All tags which contain the messages are to be CDATA'd, because very often they contain special characters

if your real goal is to represent special characters in your XML document, then problem doesn't lie in the parsing of these characters, but rather in their encoding.

CDATA

<![CDATA[ your data ]]>

deals primarily with the fact that some contents of a (XML) Document will not have to get parsed, otherwise some errors may be found. En example would be:

    <a>
            <id>my_id</id>
            <tr>& content a </tr>
            <tr> < content b < </tr>
   </a>

as the document get parsed, its content (i.e the text withing your tags)also get parsed. both content

& content a

and

< content b <

will be seen as parsing errors because of the characters "&" and "<". In order to avoid it , you don't want some content to get parsed. That's why you declare in your tag, in order to tell the parser to refrain from parsing them.

DTD and XSD are all about defining a structure for your XML document and don't provide explicitly a way to encode your characters(only XSD does it but rather for binary data element types). they help you defining which element type (String,Int, Double, and so on) your XML Document will be used , but leave the encoding issue for you.

this is cleary an encoding issue , rather than a parsing one.

arthur
  • 3,245
  • 4
  • 25
  • 34