35

Sorry I'm a Java/XML newbie - and can't seem to figure this one out. It seems it's possible to convert a Document object to a string. However, I want to convert a Node object into a string. I am using org.ccil.cowan.tagsoup Parser for my purpose.

I'm retrieving the Node by something like...

 parser = new org.ccil.cowan.tagsoup.Parser() 

 parser.setFeature(namespaceaware, false)

 Transformer transformer = TransformerFactory.newInstance().newTransformer(); 
 DOMResult domResult = new DOMResult(); 

 transformer.transform(new SAXSource(parser, new InputSource(in)), domResult);
 Node n = domResult.getNode();      

 // I'm interested in the first child, so...
 Node myNode = n.getChildNodes().item(0);

 // convert myNode to string..
 // what to do here?

The answer may be obvious, but I can't seem to figure out from the core Java libraries how to achieve this. Any help is much appreciated!

ragebiswas
  • 3,818
  • 9
  • 38
  • 39

3 Answers3

67

You can use a Transformer (error handling and optional factory configuration omitted for clarity):

Node node = ...;
StringWriter writer = new StringWriter();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(node), new StreamResult(writer));
String xml = writer.toString();
// Use xml ...
Kevin
  • 30,111
  • 9
  • 76
  • 83
  • Thanks Kevin. I tried this, but I end up with "xmlns" attributes for each HTML tag. (Actually my Node is an HTML fragment), so I end up with stuff like.. "

    ....

    Any idea how to avoid this?
    – ragebiswas Feb 08 '10 at 16:37
  • You can transform them out. See here: http://stackoverflow.com/questions/2095673/how-to-remove-the-namespaces-from-the-element – Kevin Feb 08 '10 at 16:40
  • Well, I tried this and am stuck again. I'm not sure where to call 'setNamespaceAware(false)' in the above code snippet. From this link: http://www.mail-archive.com/xalan-dev@xml.apache.org/msg05987.html - it seems that this is not a straightforward thing either. Clues? – ragebiswas Feb 09 '10 at 04:07
  • It is on the DocumentBuilderFactory. See http://java.sun.com/javase/6/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setNamespaceAware%28boolean%29 – Kevin Feb 09 '10 at 14:08
  • Kevin, I might've not been clear, but I'm using the TagSoup parser. So I'm not using DocumentBuilderFactory at all (again, I'm a java/xml newbie). This is what I'm doing: Transformer transformer = TransformerFactory.newInstance().newTransformer(); DOMResult domResult = new DOMResult(); transformer.transform(new SAXSource(parser, new InputSource(in)), domResult); Node node = domResult.getNode(); // continues in original post. Where does DBF come into this? parser = new org.ccil.cowan.tagsoup.Parser() – ragebiswas Feb 09 '10 at 14:19
  • I should also mention - I did try Parser.setFeature(Parser.isNamespaceAware, false). Now that I get the org.w3c.node, I pass it to another function like the one in the first reply - yet I get the xmlns stuff – ragebiswas Feb 09 '10 at 14:25
  • Can you edit your original question to add the code you're using to parse the XML? – Kevin Feb 09 '10 at 14:39
  • Try using: parser.setFeature(Parser.namespacesFeature, false); – Kevin Feb 09 '10 at 14:51
  • A version with error handling and optional factory configuration is located here: http://stackoverflow.com/a/33936257/363573 – Stephan Nov 26 '15 at 10:53
  • How can you get a pretty print of the XML after converting it to string? – tarekahf Feb 22 '23 at 17:23
8
String getNodeString(Node node) {
    try {
        StringWriter writer = new StringWriter();
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.transform(new DOMSource(node), new StreamResult(writer));
        String output = writer.toString();
        return output.substring(output.indexOf("?>") + 2);//remove <?xml version="1.0" encoding="UTF-8"?>
    } catch (TransformerException e) {
        e.printStackTrace();
    }
    return node.getTextContent();
}
Mohammad Asadi
  • 131
  • 2
  • 2
0

This is way to convert Node to html

public static String getInnerHTML(Node node) throws TransformerConfigurationException, TransformerException
{
    StringWriter sw = new StringWriter();
    Result result = new StreamResult(sw);
    TransformerFactory factory = TransformerFactory.newInstance();
    Transformer proc = factory.newTransformer();
    proc.setOutputProperty(OutputKeys.METHOD, "html");
    for (int i = 0; i < node.getChildNodes().getLength(); i++)
    {
        proc.transform(new DOMSource(node.getChildNodes().item(i)), result);
    }
    return sw.toString();
}
Sang9xpro
  • 435
  • 5
  • 8