With multiple SERVICE clauses and large numbers of relationships in the path, how to execute SPARQL federated query statement?

Question

Regarding the following SPARQL federated query statements:

There are over 10000 ?s2 related to ?s1
There are over 10000 ?s3 related to the previously found ?s2
There are over 10000 ?s4 related to the previously found ?s3
There are over 100 ?s5 related to the previously found ?s4
There are over 100 ?s6 related to the previously found ?s5

Now I am using Jena to execute this SPARQL federated query statement, and the query has not ended.

SELECT *
WHERE {
  SERVICE <endpoint_1> {
    ?s1 <p_1> ?s2 .
    FILTER ( ?s1 = <s_1> )
  }
  SERVICE <endpoint_2> {
    ?s2 <p_2> ?s3 .
  }
  SERVICE <endpoint_3> {
    ?s3 <p_3> ?s4 .
  }
  SERVICE <endpoint_4> {
    ?s4 <p_4> ?s5 .
  }
  SERVICE <endpoint_5> {
    ?s5 <p_5> ?s6 .
  }
}
LIMIT 100

This is a reference information: when I import these triples into the same Virtuoso, removing the SERVICE clause can get the correct results:

SELECT *
WHERE {
  ?s1 <p_1> ?s2 .
  FILTER ( ?s1 = <s_1> )
  ?s2 <p_2> ?s3 .
  ?s3 <p_3> ?s4 .
  ?s4 <p_4> ?s5 .
  ?s5 <p_5> ?s6 .
}
LIMIT 100

the point is, Jena does work in terms of iterators, as many other engines do (compare for Volcano principle) - compared to the single Virtuoso query, Jena obviosuly can't do the joins on the index level and also has no statistics about the datasets, thus, a query optimizer can't do reordering for you. Moreover, Jena does send a single query per each previous binding to the next SERVICE. So for each `?s2` it will call `endpoint_2` which returns bindings for `?s3` and for each of those bindings it will send requests to `endpoint_3` and so on and so forth — UninformedUser, May 08 '23 at 14:30
There is some custom extension of the Jena SERVICE clause which would allow bulk querying: https://jena.apache.org/documentation/query/service_enhancer.html - this would reduce the number of single requests send to each endpoint — UninformedUser, May 08 '23 at 14:35
Are you able to provide an example using live SPARQL endpoints? This would bring clarity to the following constraints : 1. INI setting 2. Limitations associated with SPARQL Query Service endpoints 3. Anytime Query functionality where query solution production pipeline operates within a timeout. — Kingsley Uyi Idehen, May 13 '23 at 19:06

With multiple SERVICE clauses and large numbers of relationships in the path, how to execute SPARQL federated query statement?

0 Answers0