2

I have a reasonable implementation of Jena over MongoDB by providing impls for Graph and DatasetGraph. SPARQL queries are converted into the appropriate query expressions in MongoDB and material, at least on triple-match-by-triple-match basis, is vended in a high performance way. This is not a surprise; indexes do what they're supposed to do. Graph is wrapped with an RDFS reasoner Model and all is fine.

I am interested in now exploring ways to optimize filtering push-down into MongoDB. For example, this SPARQL:

?s a:attested "2017-06-01T00:00:00Z"^^xsd:dateTime .

results in this setup of a MongoDB find expression:

{ "P" : "a:attested", "O" : { "$date" : 1496275200000 } }

And all is good. But this SPARQL:

?s a:attested ?theDate .
FILTER (?theDate = "2017-06-01T00:00:00Z"^^xsd:dateTime)

causes ARQ to pass only the predicate to Graph::find():

{ "P" : "a:attested" }

and a whole lot of things are pulled from the DB and the filtering is done in ARQ itself. The FILTER expression above is obviously not needed for a simple equality but it proves the point.

The TDB documentation says that "... TDB uses the OpExecutor extension point of ARQ." But the link for OpExecutor goes to a To-Do.

Can anyone point at any examples of anything where something can be hooked or otherwise accessed around the time ARQ calls Graph::ExtendedIterator<Triple> find(Triple m)? It is at this point that my implementation jumps in to craft the query, and if I can ask if filters exist, then I can "improve" the restriction on the query. At this time, it is not so important that I deal with stopping the filtering from happening again in ARQ itself.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Buzz Moschetti
  • 7,057
  • 3
  • 23
  • 33
  • 1
    I guess you already had a look [here](https://jena.apache.org/documentation/query/arq-query-eval.html) and [here](https://jena.apache.org/documentation/query/algebra.html). Anyways, please ask this on the Jena mailing list, the community is fast htere – UninformedUser Feb 06 '18 at 17:51
  • @AKSW Well, the OpEx stuff yes, and "reverse engineering" sdb/store/LibSDB.java and sdb/engine/QueryEngineSDB.java. Not for beginners for sure... – Buzz Moschetti Feb 06 '18 at 18:11
  • 3
    You wil need to provide an OpExecutor for your system. By only implementing Graph/DatasetGraph, there isn' access to filters, and it's left to the general purpsoe executor. All it can do is pull data from the graph and filter it. – AndyS Feb 06 '18 at 18:58
  • I came across a project called HDT which is a dictionaried/compressed rep for big triple set. They have a graph AND query Plan/OpEx plugin for Jena; it is a well coded and relatively compact implementation so you can actually get a sense of what's going on. https://github.com/rdfhdt/hdt-java – Buzz Moschetti Feb 08 '18 at 15:55

0 Answers0