1

I have an xml like this:

<root
    xmlns:gl-bus="http://www.xbrl.org/int/gl/bus/2006-10-25"
    xmlns:gl-cor="http://www.xbrl.org/int/gl/cor/2006-10-25" >
    <gl-cor:entityInformation>
        <gl-bus:accountantInformation>
            ...............
        </gl-bus:accountantInformation>
    </gl-cor:entityInformation>
</root>

All I want to extract the element "gl-cor:entityInformation" from the root with its child elements. However, I do not want the namespace declarations come with it.

The code is like this:

XPathExpression<Element> xpath = XPathFactory.instance().compile("gl-cor:entityInformation", Filters.element(), null, NAMESPACES);
Element innerElement = xpath.evaluateFirst(xmlDoc.getRootElement());

The problem is that the inner element holds the namespace declarations now. Sample output:

<gl-cor:entityInformation xmlns:gl-cor="http://www.xbrl.org/int/gl/cor/2006-10-25">
    <gl-bus:accountantInformation xmlns:gl-bus="http://www.xbrl.org/int/gl/bus/2006-10-25">
    </gl-bus:accountantInformation>
</gl-cor:entityInformation>

This is how I get xml as string:

public static String toString(Element element) {
    Format format = Format.getPrettyFormat();
    format.setTextMode(Format.TextMode.NORMALIZE);
    format.setEncoding("UTF-8");

    XMLOutputter xmlOut = new XMLOutputter(); 
    xmlOut.setFormat(format);
    return xmlOut.outputString(element);
}

As you see the namespace declarations are passed into the inner elements. Is there a way to get rid of these declarations without losing the prefixes?

I want this because later on I will be merging these inner elements inside another parent element and this parent element has already those namespace declarations.

Eray Tuncer
  • 707
  • 2
  • 11
  • 31
  • 1
    Do the namespaces also occur on the child when you output the new complete XML with the new parent? – Martin Honnen Aug 08 '15 at 09:56
  • @MartinHonnen No. If the parent has the namespace declarations then no declarations on the children otherwise it puts the declaration on the first child having the prefix. Ok so actually there is no real problem if namespaces are declared on the parent. However, there is no brute force way to remove those declarations? It does not let you go beyond the laws of the xml format? – Eray Tuncer Aug 08 '15 at 10:47
  • 1
    Well you have to understand that the tree stores the name and the namespace of each node and when you serialize it the normal outputter is supposed to help you to output namespace well-formed XML so the result you see when you output that single element is a wanted result in my view. Whether there is a way with the JDOM classes to serialize a node and make sure that only existing namespace declaration attributes are output but none are added comply with namespace well-formedness rules I don't know, maybe someone more familiar with that package can help. Or you need to write your own outputter. – Martin Honnen Aug 08 '15 at 10:54
  • 1
    "I want this because later on I will be merging these inner elements inside another parent element..." If you do the merge using an XML tool such as JSON (rather than using a non-XML process such as text concatenation) then this should remove any redundant namespaces. Asking an XML process to output ill-formed XML (i.e.XML containing prefixed names with no binding for the prefixes) is asking a lot. – Michael Kay Aug 08 '15 at 15:33
  • @MichaelKay, did you want to say "an XML tool like JDOM"? Or why do you say "XML tool like JSON"? – Martin Honnen Aug 08 '15 at 17:31
  • Yes, I meant JDOM not JSON. Acronym fatigue. – Michael Kay Aug 08 '15 at 22:21
  • @MichaelKay Oh I see your point now. You are right about auto-removed redundant namespaces and rolfl mentioned that ill-formed XML :) Thank you. – Eray Tuncer Aug 08 '15 at 22:35

1 Answers1

4

JDOM by design insists that the in-memory model of the XML is well structured at all times. The behaviour you are seeing is exactly what I would expect from JDOM and I consider it to be "right". JDOM's XMLOutputter also outputs well structured and internally consistent XML and XML fragments.

Changing the bahaviour of the internal in-memory model is not an option with JDOM, but customizing the XMLOutputter to change its behaviour is relatively easy. The XMLOutputter is structured to have an "engine" supplied as a constructor argument: XMLOutputter(XMLOutputProcessor). In addition, JDOM supplies an easy-to-customize default XMLOutputProcessor called AbstractXMLOutputProcessor.

You can get the behaviour you want by doing the following:

private static final XMLOutputProcessor noNamespaces = new AbstractXMLOutputProcessor() {

    @Override
    protected void printNamespace(final Writer out, final FormatStack fstack, 
        final Namespace ns)  throws IOException {
        // do nothing with printing Namespaces....
    }

};

Now, when you create your XMLOutputter to print your XML element fragment, you can do the following:

public static String toString(Element element) {
    Format format = Format.getPrettyFormat();
    format.setTextMode(Format.TextMode.NORMALIZE);
    format.setEncoding("UTF-8");

    XMLOutputter xmlOut = new XMLOutputter(noNamespaces); 
    xmlOut.setFormat(format);
    return xmlOut.outputString(element);
}

Here's a full program working with your input XML:

import java.io.IOException;
import java.io.Writer;

import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.Namespace;
import org.jdom2.filter.Filters;
import org.jdom2.input.SAXBuilder;
import org.jdom2.output.Format;
import org.jdom2.output.XMLOutputter;
import org.jdom2.output.support.AbstractXMLOutputProcessor;
import org.jdom2.output.support.FormatStack;
import org.jdom2.output.support.XMLOutputProcessor;
import org.jdom2.xpath.XPathExpression;
import org.jdom2.xpath.XPathFactory;


public class JDOMEray {

    public static void main(String[] args) throws JDOMException, IOException {
        Document eray = new SAXBuilder().build("eray.xml");
        Namespace[] NAMESPACES = {Namespace.getNamespace("gl-cor", "http://www.xbrl.org/int/gl/cor/2006-10-25")};
        XPathExpression<Element> xpath = XPathFactory.instance().compile("gl-cor:entityInformation", Filters.element(), null, NAMESPACES);
        Element innerElement = xpath.evaluateFirst(eray.getRootElement());

        System.out.println(toString(innerElement));
    }

    private static final XMLOutputProcessor noNamespaces = new AbstractXMLOutputProcessor() {

        @Override
        protected void printNamespace(final Writer out, final FormatStack fstack, 
            final Namespace ns)  throws IOException {
            // do nothing with printing Namespaces....
        }

    };

    public static String toString(Element element) {
        Format format = Format.getPrettyFormat();
        format.setTextMode(Format.TextMode.NORMALIZE);
        format.setEncoding("UTF-8");

        XMLOutputter xmlOut = new XMLOutputter(noNamespaces); 
        xmlOut.setFormat(format);
        return xmlOut.outputString(element);
    }


}

For me the above program outputs:

<gl-cor:entityInformation>
  <gl-bus:accountantInformation>...............</gl-bus:accountantInformation>
</gl-cor:entityInformation>
rolfl
  • 17,539
  • 7
  • 42
  • 76
  • I also realized that it was just an issue related with the outputter but the xml structure in the memory after I read Martin Honnen's comments. Your answer is the way of forcing the outputter to print the elements as I wish. Thank you for the help. You all helped me to understand the mechanism of JDOM actually. – Eray Tuncer Aug 08 '15 at 18:16