1

I have a java string containing XML. I want to read through this Java String wrap all the text nodes within CData, only I'm not sure how to do this. The reason for doing this is that the is a text node containing an angle bracket which is causing an exception when I try to parse the String. Can any1 help me out?

<node> this < is text <node> <node2> this is < text <node2>

I would like to know if there is an easy way of reading this text as a string with XMLReader and inserting CData around the text

thanks

Stefan

aspiringCoder
  • 415
  • 1
  • 9
  • 24
  • how are you parsing the string? You tagged it as SAX, but can you provide your code? – Woot4Moo Jan 04 '13 at 13:24
  • possible duplicate of [How to parse XML for !\[CDATA\[\]](http://stackoverflow.com/questions/8489151/how-to-parse-xml-for-cdata) – Woot4Moo Jan 04 '13 at 13:24
  • I am trying to insert a CData wrapper for every text node within XML string - note using XMLReader and SAXParser. I am not trying to get the character data out rather im trying to wrap CData around the text, looking for advice on how to do this in anyway – aspiringCoder Jan 04 '13 at 13:48
  • You might want to update your example to show the broken data you are trying to fix. – Quentin Jan 04 '13 at 14:03
  • The text nodes are generated via free text areas in another application - then xml is built up from there issue is free text means free text they can insert anything I am indeed trying to fix invalid XML but not sure whether its doable as you cannot parse it, see something like above – aspiringCoder Jan 04 '13 at 14:09
  • Tried to walk through the steps in my updated post. While not a complete solution by any stretch it should give the basic idea – Woot4Moo Jan 04 '13 at 14:21
  • 1
    Does the user enter the whole XML? If so, your design is broken (or you need to require your users to enter VALID XML). Does the user enter *something* and you *build XML from that*? Then make sure you produce valid XML in the first place! Trying to fix the not-really-XML after the fact will be a major pain in the but! – Joachim Sauer Jan 04 '13 at 14:40
  • The only solution where it still allows the user to enter special characters is to wrap everything in CData going into the database and out of the database, various applications using the data all parse the data so CData wouldn't be an issue when displaying to the user.. and the user only enter text node data the XML is built up using it – aspiringCoder Jan 04 '13 at 15:02
  • 1
    Then escape the necessary characters *while building the XML*. As I said: build *correct* XML and you'll be fine (i.e. you'll be able to use any XML parser to get your data back out). – Joachim Sauer Jan 04 '13 at 16:24

2 Answers2

2

Perhaps something like this (apologies in advance for any inefficiency:

if(currentNode instanceof XMLNodeType.Text)  
{  
     String toWrite = String.format("<![CDATA[%s]]>", currentNode.getText());   
     // or whatever retrieves text of the node
}  

It looks like you need to massage the data to be valid XML. The process for this is of course highly dependent on your input. So essentially what occurs is you receive a big string that you need to convert into valid XML. The advantage here is that you can define a schema that the third party adheres to, this is a meeting with them so it is outside of the scope of discussion, but is worth mentioning. Once you have this schema defined you will know which nodes are considered "text" nodes and need to be wrapped in CDATA blocks.

The basic idea is this:

List<String> textTags = new ArrayList<String>();  
textTags.add("NODE");  
//other things to add
String bigAwfulString = inputFromThirdParty();   
String validXML = ""; 
for(String currentNode : bigAwfulString.split("yourRegexHere")  
{  
    if(textTags.contains(currentNode)  
    {  
           validXML+=String.format("<![CDATA[%s]]>", currentNode.getText());    
           continue;
    }   
    validXML+=currentNode;
}
Woot4Moo
  • 23,987
  • 16
  • 94
  • 151
  • That won't work. The data (as described in the question, but not in the example) is invalid XML, so it won't parse. This question is about fixing up the broken XML. – Quentin Jan 04 '13 at 14:02
  • @Quentin ah yes I see now. Perhaps the issue is that OP isn't properly writing XML in the first placE? – Woot4Moo Jan 04 '13 at 14:04
  • Someone isn't. We can't tell if it is being written by the OP or if it is broken third party data. – Quentin Jan 04 '13 at 14:04
  • @Quentin true, I will leave this post in the interim until OP changes his sample code. – Woot4Moo Jan 04 '13 at 14:05
  • Generating CDATA by `String.format("%s")` is fully broken, if the string contains two closing brackets. You have to encode it properly. – ceving Sep 26 '13 at 12:28
0

Try this, it worked for me.
http://www.java2s.com/Code/Java/XML/AddingaCDATASectiontoaDOMDocument.htm

import java.io.File;

import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.CDATASection;
import org.w3c.dom.Document;
import org.w3c.dom.Element;

public class Main {
  public static void main(String[] argv) throws Exception {

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(true);

    factory.setExpandEntityReferences(false);

    Document doc = factory.newDocumentBuilder().parse(new File("filename"));
    Element element = doc.getElementById("key1");

    // Add a CDATA section to the root element
    element = doc.getDocumentElement();
    CDATASection cdata = doc.createCDATASection("data");
    element.appendChild(cdata);

  }
}