I am continuing my work on extending the Gremlinator to support a larger subset of SPARQL queries. In our work we frequently see SPARQL queries like this:
SELECT DISTINCT ?X ?Y ?Z ?V ?NAME
WHERE {
?X e:brother ?Z .
?Y e:brother ?V .
?Z v:name ?NAME .
?V v:name ?NAME .
FILTER (?X != ?Y) .
}
Which basically says: Find all Xs and Ys that have brothers with the same name, where X is not the same as Y. Which can be thought of as a SQL self-join. The example here is working with the Graph of the Gods.
I have come up with a Gremlin traversal that would find those Xs and Ys:
g.V().as("X").out("brother").as("Z").values("name").as("NAME1")
.V().as("Y").out("brother").as("V").values("name").as("NAME2")
.where("NAME1", P.eq("NAME2")).where("X", P.neq("Y")).select("X", "Y", "Z", "V", "NAME1", "NAME2");
Which I then tried to rewrite in terms of a MATCH step (to work within the existing Gremlinator framework):
g.V().match(
__.as("START").V().as("X").out("brother").as("Z"),
__.as("START").V().as("Y").out("brother").as("V"),
__.as("Z").values("name").as("NAME"),
__.as("V").values("name").as("NAME"))
.where("X", P.neq("Y")).dedup("X", "Y", "Z", "V", "NAME").select("X", "Y", "Z", "V", "NAME");
This traversal results in:
java.lang.IllegalArgumentException: Neither the sideEffects, map, nor path has a Y-key: WherePredicateStep(X,neq(Y))
whereas the same query with dedup() step added works just fine:
g.V().match(
__.as("START").V().as("X").out("brother").as("Z"),
__.as("START").V().as("Y").out("brother").as("V"),
__.as("Z").values("name").as("NAME"),
__.as("V").values("name").as("NAME"))
.dedup().where("X", P.neq("Y")).dedup("X", "Y", "Z", "V", "NAME").select("X", "Y", "Z", "V", "NAME");
What I really want to do is to just have the where clause inside the match block and not have any extraneous dedup() steps.
g.V().match(
__.as("START").V().as("X").out("brother").as("Z"),
__.as("START").V().as("Y").out("brother").as("V"),
__.as("Z").values("name").as("NAME"),
__.as("V").values("name").as("NAME”),
__.where("X", P.neq("Y")))
.dedup("X", "Y", "Z", "V", "NAME").select("X", "Y", "Z", "V", "NAME");
Which also throw the same error as other MATCH variants.
I’ve the idea of adding an extra dedup step from Section F of this page: https://www.datastax.com/blog/2017/09/gremlin-recipes-9-pattern-matching that mentioned a supposed culprit: TINKERPOP-1762. But as I have the latest Tinkerpop version, it doesn’t look like this problem has exactly the same cause, since that bug is already fixed.