1

Given this XML snippet:

<title><![CDATA[Resizing & Cropping GIF and PNG images issue]]></title>

What is the correct string that should be parsed by the XML parser for the <title> element content?

1. "Resizing & Cropping GIF and PNG images issue"
2. "Resizing &amp; Cropping GIF and PNG images issue"

Note: I'm using the ROME feed parsing library for Java, which parses this as #2, but from my understanding of CDATA blocks it should be #1. I've found evidence on the web that suggests #2 is both right (also here) and wrong - so I'm a bit perplexed (and curious) about this.

Amos
  • 1,403
  • 2
  • 13
  • 19
  • Amos, what is the source of XML data (e.g. did you try with plan JUnit test where source XML is just a String constant)? Also how do you get output (do you get parsing results as a String or convert it to another XML document)? – oiavorskyi Feb 23 '11 at 20:42
  • @iYasha - my source is http://www.daniweb.com/forums/external.php?type=RSS2 (I'm working on a feed-reader component). This feed appears to CDATA-encode title elements that have the "&" character. The Firefox and Safari feed readers parse this as expected, yet ROME parses these as "&" in the String it returns for getTitle(). I believe this is a ROME bug, but before I hack a solution around this - I would like to make sure I'm not missing something. Perhaps ROME is behaving correctly here? – Amos Feb 24 '11 at 07:52

1 Answers1

0

Well from what I can tell there is a big difference from the 2nd link where you think it's "wrong". You are using it for titles and not links/urls. I would use the 2nd one as that is valid XML. I understand that CDATA will then ignore it, but I'm not sure why you would want to ignore the title.

What are you planning on using this for? It seems to me that you would want to not CDATA everything, as the beauty in valid XML is that you know it should display consistent with XML parsers.

Jamie R Rytlewski
  • 1,172
  • 9
  • 22
  • I'm working on a feed-reader component, and I'm parsing feeds from the web - so I'm not the one creating the feeds. I encountered this issue with daniweb.com/forums/external.php?type=RSS2 (I'm working on a feed-reader component). This feed appears to CDATA-encode title elements that have the "&" character. The Firefox and Safari feed readers parse this as expected, yet ROME parses these as "&" in the String it returns for getTitle(). I believe this is a ROME bug, but before I hack a solution around this - I would like to make sure I'm not missing something. – Amos Feb 24 '11 at 07:54