JAXP XPath 1.0 or 2.0 - how to distinguish empty strings from non-existent values

Question

Given the following XML instance:

<entities>
    <person><name>Jack</name></person>
    <person><name></name></person>
    <person></person>
</entities>

I am using the following code to: (a) iterate over the persons and (b) obtain the name of each person:

XPathExpression expr = xpath.compile("/entities/person");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
    Node node = nodes.item(i);
    String innerXPath = "name/text()";
    String name  = xpath.compile(innerXPath).evaluate(node);
    System.out.printf("%2d -> name is %s.\n", i, name);
}

The code above is unable to distinguish between the 2nd person case (empty string for name) and the 3rd person case (no name element at all) and simply prints:

0 -> name is Jack.
1 -> name is .
2 -> name is .

Is there a way to distinguish between these two cases using a different innerXPath expression? In this SO question it seems that the XPath way would be to return an empty list, but I 've tried that too:

String innerXPath = "if (name) then name/text() else ()";

... and the output is still the same.

So, is there a way to distinguish between these two cases with a different innerXPath expression? I have Saxon HE on my classpath so I can use XPath 2.0 features as well.

Update

So the best I could do based on the accepted answer is the following:

XPathExpression expr = xpath.compile("/entities/person");                                                                                                                                                                                 
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);                                                                                                                                                                   
for (int i = 0 ; i < nodes.getLength() ; i++) {                                                                                                                                                                                           
    Node node = nodes.item(i);                                                                                                                                                                                                            
    String innerXPath = "name";                                                                                                                                                                                                           
    NodeList names = (NodeList) xpath.compile(innerXPath).evaluate(node, XPathConstants.NODESET);                                                                                                                                         
    String nameValue = null;                                                                                                                                                                                                              
    if (names.getLength()>1) throw new RuntimeException("impossible");                                                                                                                                                                    
    if (names.getLength()==1)                                                                                                                                                                                                             
        nameValue = names.item(0).getFirstChild()==null?"":names.item(0).getFirstChild().getNodeValue();                                                                                                                                  
    System.out.printf("%2d -> name is [%s]\n", i, nameValue);                                                                                                                                                                             
}

The above code prints:

0 -> name is [Jack]
1 -> name is []
2 -> name is [null]

In my view this is not very satisfactory as logic is spread in both XPath and Java code and limits the usefulness of XPath as a host language and API-agnostic notation. My particular use case was to just keep a collection of XPaths in a property file and evaluate them at runtime in order to obtain the information I need without any ad-hoc extra handling. Apparently that's not possible.

So what is it you want to accomplish exactly? What is your desired result? — JLRishe, Jun 30 '13 at 13:15
For instance, a way to return null if the element name does not exist and an empty string ("") if the element exists with an empty-string content. I need an XPath expression that will evaluate differently if the element `` doesn't exist at all versus if it exists with an empty value. — Marcus Junius Brutus, Jun 30 '13 at 13:39
I haven't use Java's XML APIs much, but it sounds like the `evaluate()` function will always return a non-null string, by converting whatever the result is to a string value. How about something like this: `String innerXPath = "if (name) then name/text() else '[UNSPECIFIED]'";` — JLRishe, Jun 30 '13 at 13:49
@JLRishe that's an option but I want to avoid special Strings if at all possible. — Marcus Junius Brutus, Jun 30 '13 at 13:55

score 3 · Accepted Answer · answered Jun 30 '13 at 14:03

The JAXP API, being based on XPath 1.0, is pretty limited here. My instinct would be to return the Name element (as a NodeList). So the XPath expression required is simply "Name". Then cases 1 and 2 will return a nodelist of length 1, while case 3 will return a nodelist of length 0. Cases 1 and 2 can then easily be distinguished within the application by getting the value of the node and testing whether it is zero-length.

Using /text() is always best avoided anyway, since it causes your query to be sensitive to the presence of comments in the XML.

score 0 · Answer 2 · answered Oct 14 '16 at 19:10

As a long-time user of Saxon XSLT, I'm pleased to find once again that I like Michael Kay's recommendation here. Generally, I like the pattern of returning a collection for queries, even for queries that are expected to return only at most one instance.

What I don't like doing is having to open a bundled interface to try to solve a particular need and then finding that one has to reimplement much of what the original interface handled.

Therefore, here's a method that uses Michael's recommendation while avoiding the cost of having to reimplement a Node-to-String transformation that is recommended in other comments in this thread.

@Nonnull
public Optional<String> findString( @Nonnull final String expression )
{
    try
    {
        // for XpathConstants.STRING XPath returns an empty string for both values of no length
        // and for elements that are not present.

        // therefore, ask for a NODESET and then retrieve the first Node if any

        final FluentIterable<Node> matches = 
                IterableNodeList.from( (NodeList) xpath.evaluate( expression, node, XPathConstants.NODESET ) );

        if ( matches.isEmpty() )
        {
            return Optional.absent();
        }

        final Node firstNode = matches.first().get();

        // now let XPath process a known-to-exist Node to retrieve its String value         
        return Optional.fromNullable( (String) xpath.evaluate( ".", firstNode, XPathConstants.STRING ) );
    }
    catch ( XPathExpressionException xee )
    {
        return Optional.absent();
    }
}

Here, XPath.evaluate is called a second time to do whatever it usually does to transform the first found Node to the requested String value. Without this, there is a risk that a re-implementation will yield a different result than a direct call for an XPathConstant.STRING over the same source node and for the same expression.

Of course, this code is using Guava Optional and FluentIterable to make the intention more explicit. If you don't want Guava, use Java 8 or refactor the implementation using nulls and NodeList's own collection methods.

JAXP XPath 1.0 or 2.0 - how to distinguish empty strings from non-existent values

Update

2 Answers2