Performance of a simple SPARQL query using filters

Question

I have a simple SPARQL query that looks for shared nodes within a graph using Filter, and Union arguments. The query take an unusually long time to compile. I was hoping to see if it is possible to rearrange this structurally to increase its performance.

The shared nodes are any of the three kinds of

shared between two objects
shared between two subjects, and
object of one triple while subject of another.

I provided the graph of the possible data in another question I asked.

The query looks as below:

    """SELECT DISTINCT ?b
       WHERE{ 
       {
          ?b ?p1 ?a.
          ?b ?p2 ?c.
          filter(?a != ?c).
      }
       UNION
       {
          ?a ?p1 ?b.
          ?c ?p2 ?b.
          filter(?a != ?c).
      }
       UNION
       {
          ?a ?p1 ?b.
          ?b ?p2 ?c.
          filter(?a != ?c).
       }}
       """

From since I posted this, I did some experiments and learned the most time consuming part is the middle one.

       {
          ?a ?p1 ?b.
          ?c ?p2 ?b.
          filter(?a != ?c).
      }

No, it's not possible. I mean, what do you want to "rearrange" here? You already have the most simple graph patterns necessary to match subgraphs. All of the triple patterns consist of variables, so the triple store can't make use of any index. — UninformedUser, Jan 08 '19 at 06:24
Thanks @AKSW, I have a graph of say 20k triples, that forms a few clusters of nodes. I wish to collect the connecting nodes as well as the main ones. the above query took something like an hour on a regular laptop with 8GB of Ram. I changed the way by using nested `for` loops in rdflib, it solved the problem. but i was hoping to see if there is a workaround within SPARQL. thanks! — PitPartizan, Jan 08 '19 at 06:36
Different SPARQL engines will deliver different performance, on this and other queries. It looks like you're using `rdflib`. You might try Virtuoso (from my employer, either [Open Source](http://wiki.intranet.openlinksw.com:8891/dataspace/owiki/wiki/VOS) or [Enterprise/Commercial](https://shop.openlinksw.com/license_generator/virtuoso/?serverVersionSelection=8.2); see detailed [feature comparison](https://virtuoso.openlinksw.com/features-comparison-matrix/)), among [various others](https://community.openlinksw.com/t/tabulated-relational-database-management-systems-rdbms-comparison/274). — TallTed, Jan 08 '19 at 14:16

Performance of a simple SPARQL query using filters

0 Answers0