I have a reasonable implementation of Jena over MongoDB by providing impls for Graph
and DatasetGraph
. SPARQL queries are converted into the appropriate query expressions in MongoDB and material, at least on triple-match-by-triple-match basis, is vended in a high performance way. This is not a surprise; indexes do what they're supposed to do. Graph
is wrapped with an RDFS reasoner Model
and all is fine.
I am interested in now exploring ways to optimize filtering push-down into MongoDB. For example, this SPARQL:
?s a:attested "2017-06-01T00:00:00Z"^^xsd:dateTime .
results in this setup of a MongoDB find
expression:
{ "P" : "a:attested", "O" : { "$date" : 1496275200000 } }
And all is good. But this SPARQL:
?s a:attested ?theDate .
FILTER (?theDate = "2017-06-01T00:00:00Z"^^xsd:dateTime)
causes ARQ to pass only the predicate to Graph::find()
:
{ "P" : "a:attested" }
and a whole lot of things are pulled from the DB and the filtering is done in ARQ itself. The FILTER
expression above is obviously not needed for a simple equality but it proves the point.
The TDB documentation says that "... TDB uses the OpExecutor
extension point of ARQ." But the link for OpExecutor
goes to a To-Do.
Can anyone point at any examples of anything where something can be hooked or otherwise accessed around the time ARQ calls Graph::ExtendedIterator<Triple> find(Triple m)
? It is at this point that my implementation jumps in to craft the query, and if I can ask if filters exist, then I can "improve" the restriction on the query. At this time, it is not so important that I deal with stopping the filtering from happening again in ARQ itself.