Java XML JDOM2 XPath - Read text value from XML attribute and element using XPath expression

Question

The program should be allowed to read from an XML file using XPath expressions. I already started the project using JDOM2, switching to another API is unwanted. The difficulty is, that the program does not know beforehand if it has to read an element or an attribute. Does the API provide any function to receive the content (string) just by giving it the XPath expression? From what I know about XPath in JDOM2, it uses objects of different types to evaluate XPath expressions pointing to attributes or elements. I am only interested in the content of the attribute / element where the XPath expression points to.

Here is an example XML file:

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
  </book>
  <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>

This is what my program looks like:

package exampleprojectgroup;

import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import org.jdom2.Attribute;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.filter.Filters;
import org.jdom2.input.SAXBuilder;
import org.jdom2.input.sax.XMLReaders;
import org.jdom2.xpath.XPathExpression;
import org.jdom2.xpath.XPathFactory;


public class ElementAttribute2String
{
    ElementAttribute2String()
    {
        run();
    }

    public void run()
    {
        final String PATH_TO_FILE = "c:\\readme.xml";
        /* It is essential that the program has to work with a variable amount of XPath expressions. */
        LinkedList<String> xPathExpressions = new LinkedList<>();
        /* Simulate user input.
         * First XPath expression points to attribute,
         * second one points to element.
         * Many more expressions follow in a real situation.
         */
        xPathExpressions.add( "/bookstore/book/@category" );
        xPathExpressions.add( "/bookstore/book/price" );

        /* One list should be sufficient to store the result. */
        List<Element> elementsResult = null;
        List<Attribute> attributesResult = null;
        List<Object> objectsResult = null;
        try
        {
            SAXBuilder saxBuilder = new SAXBuilder( XMLReaders.NONVALIDATING );
            Document document = saxBuilder.build( PATH_TO_FILE );
            XPathFactory xPathFactory = XPathFactory.instance();
            int i = 0;
            for ( String string : xPathExpressions )
            {
                /* Works only for elements, uncomment to give it a try. */
//                XPathExpression<Element> xPathToElement = xPathFactory.compile( xPathExpressions.get( i ), Filters.element() );
//                elementsResult = xPathToElement.evaluate( document );
//                for ( Element element : elementsResult )
//                {
//                    System.out.println( "Content of " + string + ": " + element.getText() );
//                }

                /* Works only for attributes, uncomment to give it a try. */
//                XPathExpression<Attribute> xPathToAttribute = xPathFactory.compile( xPathExpressions.get( i ), Filters.attribute() );
//                attributesResult = xPathToAttribute.evaluate( document );
//                for ( Attribute attribute : attributesResult )
//                {
//                    System.out.println( "Content of " + string + ": " + attribute.getValue() );
//                }

                /* I want to receive the content of the XPath expression as a string
                 * without having to know if it is an attribute or element beforehand.
                 */
                XPathExpression<Object> xPathExpression = xPathFactory.compile( xPathExpressions.get( i ) );
                objectsResult = xPathExpression.evaluate( document );
                for ( Object object : objectsResult )
                {
                    if ( object instanceof Attribute )
                    {
                        System.out.println( "Content of " + string + ": " + ((Attribute)object).getValue() );
                    }
                    else if ( object instanceof Element )
                    {
                        System.out.println( "Content of " + string + ": " + ((Element)object).getText() );
                    }
                }
                i++;
            }
        }
        catch ( IOException ioException )
        {
            ioException.printStackTrace();
        }
        catch ( JDOMException jdomException )
        {
            jdomException.printStackTrace();
        }
    }
}

Another thought is to search for the '@' character in the XPath expression, to determine if it is pointing to an attribute or element. This gives me the desired result, though I wish there was a more elegant solution. Does the JDOM2 API provide anything useful for this problem? Could the code be redesigned to meet my requirements?

Thank you in advance!

rolfl · Accepted Answer · 2016-10-20T14:16:54.017

XPath expressions are hard to type/cast because they need to be compiled in a system that is sensitive to the return type of the XPath functions/values that are in the expression. JDOM relies on third-party code to do that, and that third party code does not have a mechanism to correlate those types at your JDOM code's compile time. Note that XPath expressions can return a number of different types of content, including String, boolean, Number, and Node-List-like content.

In most cases, the XPath expression return type is known before the expression is evaluated, and the programmer has the "right" casting/expectations for processing the results.

In your case, you don't, and the expression is more dynamic.

I recommend that you declare a helper function to process the content:

private static final Function extractValue(Object source) {
    if (source instanceof Attribute) {
        return ((Attribute)source).getValue();
    }
    if (source instanceof Content) {
        return ((Content)source).getValue();
    }
    return String.valueOf(source);
}

This at least will neaten up your code, and if you use Java8 streams, can be quite compact:

List<String> values = xPathExpression.evaluate( document )
                      .stream()
                      .map(o -> extractValue(o))
                      .collect(Collectors.toList());

Note that the XPath spec for Element nodes is that the string-value is the concatination of the Element's text() content as well as all child elements' content. Thus, in the following XML snippet:

<a>bilbo <b>samwise</b> frodo</a>

the getValue() on the a element will return bilbo samwise frodo, but the getText() will return bilbo frodo. Choose which mechanism you use for the value extraction carefully.

Is `Attribute` in JDOM2 a subclass of `Content`? http://www.jdom.org/docs/apidocs/org/jdom2/Attribute.html does not show that so I am confused why your answer seems to suggest that `XPathExpression xPathExpression = xPathFactory.compile( xPathExpressions.get( i ), Filters.content() )` handles elements and attributes. — Martin Honnen, Oct 20 '16 at 13:37
Ahhh.... crap. I had forgotten that Attributes are not content. It has the `getValue()` method and I assumed. Let me think about this for a moment. — rolfl, Oct 20 '16 at 13:54
I can't think of a better way to process ambiguous XPath results other than to inspect it. JDOM could have made things a little easier if both Element and Attribute nodes share a common ancestor, but there are other reasons why that is not feasible. I edited the answer to recommend a function extraction to neaten up the code, rather than change the basic mechanism described by the OP. — rolfl, Oct 20 '16 at 14:19
Thank you very much for replying, Rolf :) Your answer clears many things up for me. Thanks for pointing out there is a "Content" object in JDOM and that an XPath expression can have a multitude of different return types. — Stefan, Oct 21 '16 at 13:23

score 0 · Answer 2 · answered Jan 18 '17 at 21:55

I had the exact same problem and took the approach of recognizing when an attribute is the focus of the Xpath. I solved with two functions. The first complied the XPathExpression for later use:

    XPathExpression xpExpression;
    if (xpath.matches(  ".*/@[\\w]++$")) {
        // must be an attribute value we're after.. 
        xpExpression = xpfac.compile(xpath, Filters.attribute(), null, myNSpace);
    } else { 
        xpExpression = xpfac.compile(xpath, Filters.element(), null, myNSpace);
    }

The second evaluates and returns a value:

Object target = xpExpression.evaluateFirst(baseEl);
if (target != null) {
    String value = null;
    if (target instanceof Element) {
        Element targetEl = (Element) target;
        value = targetEl.getTextNormalize();
    } else if (target instanceof Attribute) {
        Attribute targetAt = (Attribute) target;
        value = targetAt.getValue();
    }

I suspect its a matter of coding style whether you prefer the helper function suggested in the previous answer or this approach. Either will work.

Java XML JDOM2 XPath - Read text value from XML attribute and element using XPath expression

2 Answers2