8

Suppose I have some jena query object :

String query = "SELECT * WHERE{ ?s <some_uri> ?o ...etc. }";
Query q = QueryFactory.create(query, Syntax.syntaxARQ);

What would be the best way to get all of the subjects of the triples in the query? Preferably without having to do any string parsing/manipulation manually.

For example, given a query

SELECT * WHERE {
    ?s ?p ?o;
       ?p2 ?o2.
    ?s2 ?p3 ?o3.
    ?s3 ?p4 ?o4.
    <http://example.com> ?p5 ?o5.
}

I would hope to have returned some list which looks like

[?s, ?s2, ?s3, <http://example.com>]

In other words, I want the list of all subjects in a query. Even having only those subjects which were variables or those which were literals/uris would be useful, but I'd like to find a list of all of the subjects in the query.

I know there are methods to return the result variables (Query.getResultVars) and some other information (see http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/query/Query.html), but I can't seem to find anything which will get specifically the subjects of the query (a list of all result variables would return the predicates and objects as well).

Any help appreciated.

Nick Bartlett
  • 4,865
  • 2
  • 24
  • 37
  • Is the query string given or can you change it? Because your problem can be solved quite easily by changing the query. – Sentry Mar 04 '13 at 14:59
  • I think I finally understood what you mean. You don't want all subjects in the result, but all binding variables for subjects in the query, right? If so, please make it more obvious in the question. – Sentry Mar 04 '13 at 15:09
  • why not itrerating over result vars? do you want to create a table with result vars? if not please give an example. – Çağdaş Mar 04 '13 at 15:13
  • Also, I think there is an error in your example. Shouldn't your intended result be `[s, s2, s3]`? – Sentry Mar 04 '13 at 15:25
  • Edited for clarity. The query string is given, and can't be changed. I would hope to get all binding variables for subjects in the query, in addition to any literals/uris which act as subjects in the query. – Nick Bartlett Mar 04 '13 at 15:38

2 Answers2

12

Interesting question. What you need to do is go through the query, and for each block of triples iterate through and look at the first part.

The most robust way to do this is via an element walker which will go through each part of the query. It might seem over the top in your case, but queries can contain all sorts of things, including FILTERs, OPTIONALs, and nested SELECTs. Using the walker means that you can ignore that stuff and focus on only what you want:

Query q = QueryFactory.create(query); // SPARQL 1.1

// Remember distinct subjects in this
final Set<Node> subjects = new HashSet<Node>();

// This will walk through all parts of the query
ElementWalker.walk(q.getQueryPattern(),
    // For each element...
    new ElementVisitorBase() {
        // ...when it's a block of triples...
        public void visit(ElementPathBlock el) {
            // ...go through all the triples...
            Iterator<TriplePath> triples = el.patternElts();
            while (triples.hasNext()) {
                // ...and grab the subject
                subjects.add(triples.next().getSubject());
            }
        }
    }
);
user205512
  • 8,798
  • 29
  • 28
  • So, this intuitively makes sense, thank you for pointing out `ElementWalker` and `ElementVisitor`s. However, this implementation doesn't seem to work on the above query (nor any other). I find it difficult to debug this kind of overriding of the visit method; any idea what might be the problem? Any time I try this kind of override I just get an empty list returned. Anyone have another example of this in action? – Nick Bartlett Mar 04 '13 at 19:56
  • 2
    You need to look in ElementPathBlock for SPARQL 1.1 as well as ElementTriplesBlock. See PatternVarsVisitor – AndyS Mar 04 '13 at 22:03
  • works like a charm. for future reference, am I just missing something somewhere, or is this actually kind of difficult to learn how to use at this level? I couldn't find any example code or information on how to do this, aside from the vague reference here: http://jena.apache.org/documentation/query/manipulating_sparql_using_arq.html and it doesn't seem to be very well documented. maybe I'm just daft. – Nick Bartlett Mar 06 '13 at 12:32
  • You're not missing anything. Firstly, this stuff isn't a common requirement for normal users so it doesn't crop up very often (store implementers tend to hang around on jena-dev etc, and ask directly about this). Secondly this part of the codebase has been a moving target while SPARQL was standardised and Andy improved ARQ. – user205512 Mar 06 '13 at 13:44
1

It might be too late but another way is to make use of Jena ARQ libraries and create Algebra of the given query. Once the algebra is created, it can be compiled and you can traverse through all the triples (given in the where clause). Here is the code, I hope it helps:

Query query = qExec.getQuery(); //qExec is an object of QueryExecutionFactory

// Generate algebra of the query
Op op = Algebra.compile(query);
CustomOpVisitorBase opVisitorBase = new CustomOpVisitorBase();
opVisitorBase.opVisitorWalker(op);
List<Triple> queryTriples = opVisitorBase.triples;

CustomOpVisitor class is given below:

public class CustomOpVisitorBase extends OpVisitorBase {
List<Triple> triples = null;
void opVisitorWalker(Op op) {
    OpWalker.walk(op, this);
}

@Override
public void visit(final OpBGP opBGP) {
    triples = opBGP.getPattern().getList();
}
}

Traverse through the list of Triples and make use of given property functions such as triple.getSubject() etc etc.