SPARQL: Selecting the nth blanknode

Question

Take the following graph:

:Foo :p _:b0 ;
   :p _:b1 ;
   :p _:b2 .

_:b0 :p1 :apple ;
   :p2 :banana .

_:b1 :p3 :cantaloupe ;
   :p4 :date ;
   :p5 :elderberry .

_:b2 :p6 :fig .

Notice: :Foo is the subject of three triples with the same predicate, :p. Each of those triples has a blanknode as its object.

Is it possible to write a SPARQL query that selects all triples where only _:b1 is the subject?

EDIT: Before proposing an answer, please understand that I am looking for a clever solution to my question, in SPARQL. Assume that the triple store is fixed (ie: nothing can be done to change the data). The graph I show above is contrived; each blanknode does not have the same number of p/o triples. If they each had one triple however, then the following SPARQL query might suffice:

select ?b1 where {
   :Foo :p ?bn .
   ?bn ?p ?o
} limit 1 offset 1

Obviously, the concern here is returning the same blanknode each time. I know it's a set and inherently unordered, thus repeatable results ordering is not guaranteed; but honestly... for a fixed triple store, I sincerely doubt that a DFA would return different blanknode ordering between queries. Any clever ideas?

Jeen Broekstra · Answer 1 · 2015-09-30T19:54:13.507

You can't select the 'nth' blank node in SPARQL, for two reasons:

an RDF model is a set: triples are unordered.
a blank node represents a resource without an identifier - which means it can't be (directly) addressed/identified.

In RDF/SPARQL you work with blank nodes in an indirect fashion: instead of trying to address them directly (which, as we saw above is impossible as the very definition of a blank node is that it has no identifier), you look at the things that connects them to other resources, that is, the statements in which they are involved. After all, the statements give the blank node its contextual meaning.

In your case: the differences between _:b1 and the other two blank nodes are in the statements in which they play the role of subject. So to query in SPARQL for the triples where _:b1 is the subject, you should look at the data and see that, _:b1 uniquely has a property :p3 with value :cantaloupe. So you could query like so:

   CONSTRUCT { ?s ?p ?o }
   WHERE { :Foo :p ?s .
           ?s :p3 :cantaloupe ;
              ?p ?o .
   }

On a side note: several SPARQL engine implementations offer some functionality to work around the problem of blank nodes having no (global) identifier. In many cases, they introduce some non-standard syntax extension or a custom function that allows you to directly address a blank node in a SPARQL query. I want to stress that that is non-standard, unlikely to work across different endpoints, and is therefore best avoided.

If you find that you really can't work without somehow addressing your blank nodes directly, you should consider not using blank nodes at all in your data, but creating proper IRIs for these things instead.

UPDATE What your update to the question basically asks is this: "Can I make use of some undocumented feature in an unnamed specific implementation of SPARQL to do a query that is, strictly speaking, not legal, or not guaranteed to give the result I want, and get away with it?" The answer to that question is: probably yes, but it depends on which SPARQL implementation you're using, and it's a Very Bad Idea, for all the reasons I've given you above.

Many (most?) triplestores will indeed give the same result back in the same order between queries in practice, though that is not guaranteed (I can't stress this enough) and you really shouldn't rely on it. Of course, you can get an ordered query result by using an ORDER BY clause on your query, but that won't help in this case since the relative ordering of blank nodes is undefined in SPARQL (so a query engine is free to return _:b1 and _:b2 in any order it sees fit, even if there is an ORDER BY clause). Even worse: while your input RDF file may contain blank node identifiers _:b1 and _:b2, that is not necessarily what a SPARQL query will give back. Many triplestores substitute blank node identifiers with internally generated ids, and your SPARQL query is just as likely to return _:genid-908c909aeacc4b6da3d3059e18706d68-b1 instead of simply _:b1.

And even if you could get the blank node id back reliably somehow: what are you gonna do with it? A blank node is blank. The id it carries is only for internal book-keeping purposes - you can't use the blank node to query anything further.

Trust me: it's a bad idea. If you can't change the data, rely on the properties which connect the blank nodes and query for those.

I know, *sigh*... but thank you for taking your time to write this up for others. I've updated my question. — Blake Regalia, Sep 30 '15 at 05:42

SPARQL: Selecting the nth blanknode

1 Answers1