46

My question is: How can I get elements directly under a specific parent element when there are other elements with the same name as a "grandchild" of the parent element.

I'm using the Java DOM library to parse XML Elements and I'm running into trouble. Here's some (a small portion) of the xml I'm using:

<notifications>
  <notification>
    <groups>
      <group name="zip-group.zip" zip="true">
        <file location="C:\valid\directory\" />
        <file location="C:\another\valid\file.doc" />
        <file location="C:\valid\file\here.txt" />
      </group>
    </groups>
    <file location="C:\valid\file.txt" />
    <file location="C:\valid\file.xml" />
    <file location="C:\valid\file.doc" />
  </notification>
</notifications>

As you can see, there are two places you can place the <file> element. Either in groups or outside groups. I really want it structured this way because it's more user-friendly.

Now, whenever I call notificationElement.getElementsByTagName("file"); it gives me all the <file> elements, including those under the <group> element. I handle each of these kinds of files differently, so this functionality is not desirable.

I've thought of two solutions:

  1. Get the parent element of the file element and deal with it accordingly (depending on whether it's <notification> or <group>.
  2. Rename the second <file> element to avoid confusion.

Neither of those solutions are as desirable as just leaving things the way they are and getting only the <file> elements which are direct children of <notification> elements.

I'm open to IMPO comments and answers about the "best" way to do this, but I'm really interested in DOM solutions because that's what the rest of this project is using. Thanks.

kentcdodds
  • 27,113
  • 32
  • 108
  • 187
  • Why don't you use XPath to get both list of nodes and treat them differently ? `//groups/group/file` and `//notification/file` would suffice to have them. Or dou you want only one XPath to get them all ? – Alex May 21 '12 at 17:40
  • Why not create this collection by you own looping throught direct childs, like hits:"NodeList nodes = element.getChildNodes(); for (int i = 0; i < nodes.getLength(); i++) { //if element path check - add it to the collection }"? – Dmitry May 21 '12 at 17:43
  • @Alex org.w3c.dom doesn't support XPath; he'd want to use a different library, such as org.jdom.xpath, for that... though I fully agree that it's the more elegant approach. – Charles Duffy May 21 '12 at 17:44
  • `javax.xml.xpath` is Java Standard, so I think he can pretty much use it, no need to get JDom just for this simple task. – Alex May 21 '12 at 17:46
  • I should mention that this is only a small part of a much bigger xml file :) Wanted to make it readable. – kentcdodds May 21 '12 at 17:50

9 Answers9

23

I realise you found something of a solution to this in May @kentcdodds but I just had a fairly similar problem which I've now found, I think (perhaps in my usecase, but not in yours), a solution to.

a very simplistic example of my XML format is shown below:-

<?xml version="1.0" encoding="utf-8"?>
<rels>
    <relationship num="1">
        <relationship num="2">
            <relationship num="2.1"/>
            <relationship num="2.2"/>
        </relationship>
    </relationship>
    <relationship num="1.1"/>
    <relationship num="1.2"/>

</rels>

As you can hopefully see from this snippet, the format I want can have N-levels of nesting for [relationship] nodes, so obviously the problem I had with Node.getChildNodes() was that I was getting all nodes from all levels of the hierarchy, and without any sort of hint as to Node depth.

Looking at the API for a while , I noticed there are actually two other methods that might be of some use:-

Together, these two methods seemed to offer everything that was required to get all of the immediate descendant elements of a Node. The following jsp code should give a fairly basic idea of how to implement this. Sorry for the JSP. I'm rolling this into a bean now but didn't have time to create a fully working version from picked apart code.

<%@page import="javax.xml.parsers.DocumentBuilderFactory,
                javax.xml.parsers.DocumentBuilder,
                org.w3c.dom.Document,
                org.w3c.dom.NodeList,
                org.w3c.dom.Node,
                org.w3c.dom.Element,
                java.io.File" %><% 
try {

    File fXmlFile = new File(application.getRealPath("/") + "/utils/forms-testbench/dom-test/test.xml");
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = dBuilder.parse(fXmlFile);
    doc.getDocumentElement().normalize();

    Element docEl = doc.getDocumentElement();       
    Node childNode = docEl.getFirstChild();     
    while( childNode.getNextSibling()!=null ){          
        childNode = childNode.getNextSibling();         
        if (childNode.getNodeType() == Node.ELEMENT_NODE) {         
            Element childElement = (Element) childNode;             
            out.println("NODE num:-" + childElement.getAttribute("num") + "<br/>\n" );          
        }       
    }

} catch (Exception e) {
    out.println("ERROR:- " + e.toString() + "<br/>\n");
}

%>

This code would give the following output, showing only direct child elements of the initial root node.

NODE num:-1
NODE num:-1.1
NODE num:-1.2

Hope this helps someone anyway. Cheers for the initial post.

BizNuge
  • 938
  • 1
  • 8
  • 20
  • +1 for providing another totally acceptable answer to the question. :) – kentcdodds Jun 20 '12 at 13:40
  • Cheers @kentcdodds Quite an interesting problem to tackle and find another solution to actually. fairly glad I can continue to use the org.w3c.dom without having to port existing code. Thanks for the question! – BizNuge Jun 20 '12 at 13:45
  • 3
    +1 for a really simple, easy and clean solution. You can use a `for` loop with this technique, to keep it elegant and to preserve scope: `for (Node n = docEl.getFirstChild(); n != null; n = n.getNextSibling())`. – krispy Apr 27 '15 at 05:36
  • What is the difference to [getChildNodes](https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html#getChildNodes())? – ceving Sep 16 '20 at 14:17
  • 1
    @ceving - I think the problem was getChildNodes was bringing back ALL child nodes from ALL levels of the hierarchy. This was 8 years ago, so the API may well have moved on since that time, but getChildNodes didn't work for either myself or kentcdodds at the time I guess. – BizNuge Sep 16 '20 at 19:02
  • getChildNodes does not return all descendants. – ceving Sep 17 '20 at 06:43
  • @ceving - NodeList getChildNodes() - A NodeList that contains ALL children of this node. If there are no children, this is a NodeList containing no nodes. Not sure why we're arguing over this point, but this post was 8 years ago. Doesn't seem like a good use of either of our time. – BizNuge Sep 18 '20 at 09:13
  • @ceving - I couldn't leave it. Just did a quick test and yes, as you suggested, Node.getChildNodes() DOES do exactly what its name suggests now. This definitely didn't work 8 years ago, which I guess would have been a JDK7 version. I'm on JDK8 now I think, so the test I just did might not be against the correct version. – BizNuge Sep 18 '20 at 12:12
15

You can use XPath for this, using two path to get them and process them differently.

To get the <file> nodes direct children of <notification> use //notification/file and for the ones in <group> use //groups/group/file.

This is a simple sample:

public class SO10689900 {
    public static void main(String[] args) throws Exception {
        DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        Document doc = db.parse(new InputSource(new StringReader("<notifications>\n" + 
                "  <notification>\n" + 
                "    <groups>\n" + 
                "      <group name=\"zip-group.zip\" zip=\"true\">\n" + 
                "        <file location=\"C:\\valid\\directory\\\" />\n" + 
                "        <file location=\"C:\\this\\file\\doesn't\\exist.grr\" />\n" + 
                "        <file location=\"C:\\valid\\file\\here.txt\" />\n" + 
                "      </group>\n" + 
                "    </groups>\n" + 
                "    <file location=\"C:\\valid\\file.txt\" />\n" + 
                "    <file location=\"C:\\valid\\file.xml\" />\n" + 
                "    <file location=\"C:\\valid\\file.doc\" />\n" + 
                "  </notification>\n" + 
                "</notifications>")));
        XPath xpath = XPathFactory.newInstance().newXPath();
        XPathExpression expr1 = xpath.compile("//notification/file");
        NodeList nodes = (NodeList)expr1.evaluate(doc, XPathConstants.NODESET);
        System.out.println("Files in //notification");
        printFiles(nodes);

        XPathExpression expr2 = xpath.compile("//groups/group/file");
        NodeList nodes2 = (NodeList)expr2.evaluate(doc, XPathConstants.NODESET);
        System.out.println("Files in //groups/group");
        printFiles(nodes2);
    }

    public static void printFiles(NodeList nodes) {
        for (int i = 0; i < nodes.getLength(); ++i) {
            Node file = nodes.item(i);
            System.out.println(file.getAttributes().getNamedItem("location"));
        }
    }
}

It should output:

Files in //notification
location="C:\valid\file.txt"
location="C:\valid\file.xml"
location="C:\valid\file.doc"
Files in //groups/group
location="C:\valid\directory\"
location="C:\this\file\doesn't\exist.grr"
location="C:\valid\file\here.txt"
Alex
  • 25,147
  • 6
  • 59
  • 55
  • Looks like a good answer, and in the future I may move from `DOM` to `XPath`. But for this project this is the last thing I need to do and I want to stick with `DOM`. However, unless I get another answer for `DOM`, I'll accept yours because it's a good answer. Either way, you get a +1 for such a thorough answer. – kentcdodds May 21 '12 at 18:06
  • If you need to stick with DOM, then you will need to iterate over the `NodeList` using `((Node)notificationElement).getChildNodes()` and keep only the one whose names are `file`. Ideally you will have to find all `notification` tags to do that. The same needs to be done for `group` tags. – Alex May 21 '12 at 18:17
  • I found a better solution. The reason that wont work is because there are a lot of `childNodes` in the `notification` element. I answered the question though. Thanks for your good answer. I really will look into XPath in the future. – kentcdodds May 21 '12 at 18:26
  • I'm looking for a way to search for an element by path `root/etc/foo` and eventually create it, or it's parent nodes if these don't exist. Can I use something better than a for loop in children nodes? I only care about the first occurence. – Tomáš Zato Jan 20 '14 at 23:15
  • XPath is extremely slow. I had a program using XPath for every node selection and it took more than 5 hours to finish. After I had replaced every XPath usage by an equivalent function using `getChildNodes`, the program finishes in less than 10 minutes. – ceving Sep 17 '20 at 06:48
14

Well, the DOM solution to this question is actually pretty simple, even if it's not too elegant.

When I iterate through the filesNodeList, which is returned when I call notificationElement.getElementsByTagName("file"), I just check whether the parent node's name is "notification". If it isn't then I ignore it because that will be handled by the <group> element. Here's my code solution:

for (int j = 0; j < filesNodeList.getLength(); j++) {
  Element fileElement = (Element) filesNodeList.item(j);
  if (!fileElement.getParentNode().getNodeName().equals("notification")) {
    continue;
  }
  ...
}
RubioRic
  • 2,442
  • 4
  • 28
  • 35
kentcdodds
  • 27,113
  • 32
  • 108
  • 187
  • @JanusTroelsen, if you're talking about the second line when I cast the item as an element, then it depends on the DOM you're parsing... If not, what do you mean? – kentcdodds Jul 28 '13 at 19:54
  • 1
    Why didn't you just iterate through element.getChildNodes()? – FINDarkside Apr 02 '15 at 12:32
  • 1
    The 'getParentNode' function (and 'getNodeName') is available on the 'Node' interface. So for just checking the name, no cast is needed. (and just for safety switch the equals to be "notification".equals(...)) – Justin Apr 01 '16 at 12:20
5

If you stick with the DOM API

NodeList nodeList = doc.getElementsByTagName("notification")
    .item(0).getChildNodes();

// get the immediate child (1st generation)
for (int i = 0; i < nodeList.getLength(); i++)
    switch (nodeList.item(i).getNodeType()) {
        case Node.ELEMENT_NODE:

            Element element = (Element) nodeList.item(i);
            System.out.println("element name: " + element.getNodeName());
            // check the element name
            if (element.getNodeName().equalsIgnoreCase("file"))
            {

                // do something with you "file" element (child first generation)

                System.out.println("element name: "
                    + element.getNodeName() + " attribute: "
                    + element.getAttribute("location"));

            }
    break;

}

Our first task is to get an element "Notification" (in this case the first -item (0)-) and all of its children:

NodeList nodeList = doc.getElementsByTagName("notification")
    .item(0).getChildNodes();

(later you can work with all elements using getting all the elements).

For every child of "Notification":

for (int i = 0; i < nodeList.getLength(); i++)

you first get its type in order to see whether it is an element:

switch (nodeList.item(i).getNodeType()) {
    case Node.ELEMENT_NODE:
        //.......
        break;  
}

If it's the case, then you got your children "file" , that are not grand children "Notification"

and your can check them out:

if (element.getNodeName().equalsIgnoreCase("file"))
{

    // do something with you "file" element (child first generation)

    System.out.println("element name:"
        + element.getNodeName() + " attribute: "
        + element.getAttribute("location"));

}

and the ouptut is:

element name: file
element name:file attribute: C:\valid\file.txt
element name: file
element name:file attribute: C:\valid\file.xml
element name: file
element name:file attribute: C:\valid\file.doc
dckuehn
  • 2,427
  • 3
  • 27
  • 37
arthur
  • 3,245
  • 4
  • 25
  • 34
  • thanks for the solution. My solution is similar to this, but I don't iterate through all the children because there are a lot more children in that element which I didn't display in my question just to avoid information overload. Anyway, thanks again. +1 for a good answer. – kentcdodds May 21 '12 at 18:29
  • @kentcdodds.I update my Answer.You see,working with XML without using "ID" leaves you basically with only "getElementsByTagName" and "getChildNodes" to play with. You don't have in my opinion other answers when working directly with the DOM.Sorry you have to stick with the DOM.Whatever the solution it will probably come down to how your access the children of a given Node(in this case "Notification").My solution checks the type Node in order to spare you unnecessary work.But you'll still have to iterate ALL the children.That's what happen when there no "ID" : you end up with a collection. – arthur May 21 '12 at 18:47
  • 1
    @arthur (off-topic) For the love of all that is holy, please put some whitespace between a period and the first letter of the next sentence. This is pure madness! – klaar Sep 07 '15 at 10:21
4

I had the same problem in one of my projects and wrote a little function which will return a List<Element> containing only the immediate children. Basically it checks for each node returned by getElementsByTagName if it's parentNode is actually the node we are searching childs of:

public static List<Element> getDirectChildsByTag(Element el, String sTagName) {
        NodeList allChilds = el.getElementsByTagName(sTagName);
        List<Element> res = new ArrayList<>();

        for (int i = 0; i < allChilds.getLength(); i++) {
            if (allChilds.item(i).getParentNode().equals(el))
                res.add((Element) allChilds.item(i));
        }

        return res;
    }

The accepted answer by kentcdodds will return wrong results (e.g. grandchilds) if there is a childnode called "notification" - e.g. returning grandchilds when the element "group" would have the name "notification". I was facing that setup in my project, which is why I came up with my function.

Andy
  • 151
  • 9
0

I wrote this function to get the node value by tagName, restrict to top level

public static String getValue(Element item, String tagToGet, String parentTagName) {
    NodeList n = item.getElementsByTagName(tagToGet);
    Node nodeToGet = null;
    for (int i = 0; i<n.getLength(); i++) {
        if (n.item(i).getParentNode().getNodeName().equalsIgnoreCase(parentTagName)) {
            nodeToGet = n.item(i);
        }
    }
    return getElementValue(nodeToGet);
}

public final static String getElementValue(Node elem) {
    Node child;
    if (elem != null) {
        if (elem.hasChildNodes()) {
            for (child = elem.getFirstChild(); child != null; child = child
                    .getNextSibling()) {
                if (child.getNodeType() == Node.TEXT_NODE) {
                    return child.getNodeValue();
                }
            }
        }
    }
    return "";
}
Danimate
  • 69
  • 1
  • 3
0

I encountered a related problem where I needed to process just the immediate child nodes even though the treatment of all "file" nodes is similar. For my solution, I compare the Element's parent node with the node that is being processed in order to determine whether the Element is an immediate child.

NodeList fileNodes = parentNode.getElementsByTagName("file");
for(int i = 0; i < fileNodes.getLength(); i++){
            if(parentNode.equals(fileNodes.item(i).getParentNode())){
                if (fileNodes.item(i).getNodeType() == Node.ELEMENT_NODE) {

                    //process the child node...
                }
            }
        }
KalenGi
  • 1,766
  • 4
  • 25
  • 40
0

There is a nice LINQ solution:

For Each child As XmlElement In From cn As XmlNode In xe.ChildNodes Where cn.Name = "file"
    ...
Next
ShibbyUK
  • 1,501
  • 9
  • 12
0

I ended up creating an extension function in Kotlin to do this

fun Element.childrenWithTagName(name: String): List<Node> = childNodes
    .asList()
    .filter { it.nodeName == name }

callers may use it like:

val meta = target.newChildElement("meta-coverage")
source.childrenWithTagName("counter").forEach {
    meta.copyElementWithAttributes(it)
}

As list Implementation:


fun NodeList.asList(): List<Node> = InternalNodeList(this)

private class InternalNodeList(
    private val list: NodeList,
    override val size: Int = list.length
) : RandomAccess, AbstractList<Node>() {
    override fun get(index: Int): Node = list.item(index)
}

Michael
  • 367
  • 2
  • 7