0

I have the following graph:

enter image description here

Vertices and edges have been added like this:

def graph=ConfiguredGraphFactory.open('Baptiste');def g = graph.traversal();
graph.addVertex(label, 'Group', 'text', 'BNP Paribas');
graph.addVertex(label, 'Group', 'text', 'BNP PARIBAS');
graph.addVertex(label, 'Company', 'text', 'JP Morgan Chase');
graph.addVertex(label, 'Location', 'text', 'France');
graph.addVertex(label, 'Location', 'text', 'United States');
graph.addVertex(label, 'Location', 'text', 'Europe');
def v1 = g.V().has('text', 'JP Morgan Chase').next();def v2 = g.V().has(text, 'BNP Paribas').next();v1.addEdge('partOf',v2);
def v1 = g.V().has('text', 'JP Morgan Chase').next();def v2 = g.V().has(text, 'United States').next();v1.addEdge('doesBusinessIn',v2);
def v1 = g.V().has('text', 'BNP Paribas').next();def v2 = g.V().has(text, 'United States').next();v1.addEdge('doesBusinessIn',v2);
def v1 = g.V().has('text', 'BNP Paribas').next();def v2 = g.V().has(text, 'France').next();v1.addEdge('partOf',v2);
def v1 = g.V().has('text', 'BNP PARIBAS').next();def v2 = g.V().has(text, 'Europe').next();v1.addEdge('partOf',v2);

And I need a query that returns me every paths possible given specific vertex labels, edge labels and number of possible hops. Let's say I need paths with maximum hops of 2 and every labels in this example. I tried this query:

def graph=ConfiguredGraphFactory.open('TestGraph');
def g = graph.traversal();
g.V().has(label, within('Location', 'Company', 'Group'))
.repeat(bothE().has(label, within('doesBusinessIn', 'partOf')).bothV().has(label, within('Location', 'Company', 'Group')).simplePath())
.emit().times(2).path();

This query returns 20 paths (supposed to return 10 paths). So it returns paths in the 2 possible directions. Is there a way to specify that I need only 1 direction? I tried adding dedup() in my query but it returns 7 paths instead of 10 so it's not working?

Also whenever I try to find paths with 4 hops, it doesn't return me the "cyclic" paths such as France -> BNP Paribas -> United States -> JP Morgan Chase -> BNP Paribas. Any idea what to add in my query to allow returning those kind of paths?

EDIT: Thanks for your solution @DanielKuppitz. It seems to be exactly what I'm looking for.

I use JanusGraph built on top of Apache Tinkerpop: I tried the first query:

g.V().hasLabel('Location', 'Company', 'Group').
  repeat(bothE('doesBusinessIn', 'partOf').otherV().simplePath()).
    emit().times(2).
  path().
  dedup().
    by(unfold().order().by(id).fold())

And it threw the following error:

Error: org.janusgraph.graphdb.relations.RelationIdentifier cannot be cast to java.lang.Comparable

So I moved the dedup command. into the repeat loop like so:

g.V().hasLabel('Location', 'Company', 'Group').
      repeat(bothE('doesBusinessIn', 'partOf').otherV().simplePath().dedup().by(unfold().order().by(id).fold())).
      emit().times(2).
      path().

And it only returned 6 paths :

[
  [
    "JP Morgan Chase",
    "doesBusinessIn",
    "United States"
  ],
  [
    "JP Morgan Chase",
    "partOf",
    "BNP Paribas"
  ],
  [
    "JP Morgan Chase",
    "partOf",
    "BNP Paribas",
    "partOf",
    "France"
  ],
  [
    "Europe",
    "partOf",
    "BNP PARIBAS"
  ],
  [
    "BNP PARIBAS",
    "partOf",
    "Europe"
  ],
  [
    "United States",
    "doesBusinessIn",
    "JP Morgan Chase"
  ]
]

I'm not sure what's going on here... Any ideas?

Baptiste Arnaud
  • 2,522
  • 3
  • 25
  • 55
  • 1
    Instead of a picture (which is nice) it would be much better if you could simply provide a Gremlin script that creates some sample data for those answering questions here to work with. Here's an example in the answer of a different question: https://stackoverflow.com/a/51337481/1831717 - those scripts remove a lot of confusion, provide context and just generally make answering better for everyone. :) – stephen mallette Jul 17 '18 at 17:50
  • 1
    You should also clarify why this path is "weird" or unexpected, cause it looks good to me and matches your graph visualization. – Daniel Kuppitz Jul 19 '18 at 16:02
  • @DanielKuppitz & stephenmallette Thanks for those suggestions. I re-wrote the question. – Baptiste Arnaud Jul 23 '18 at 08:24

1 Answers1

3

Is there a way to specify that I need only 1 direction?

You kinda need a bidirected traversal, so you'll have to filter duplicated paths in the end ("duplicated" in this case means that 2 paths contain the same elements). In order to do that you can dedup() paths by a deterministic order of elements; the easiest way to do it is to order the elements by their id.

g.V().hasLabel('Location', 'Company', 'Group').
  repeat(bothE('doesBusinessIn', 'partOf').otherV().simplePath()).
    emit().times(2).
  path().
  dedup().
    by(unfold().order().by(id).fold())

Any idea what to add in my query to allow returning those kinds of paths (cyclic)?

Your query explicitly prevents cyclic paths through the simplePath() step, so it's not quite clear in which scenarios you want to allow them. I assume that you're okay with a cyclic path if the cycle is created by only the first and last element in the path. In this case, the query would look more like this:

g.V().hasLabel('Location', 'Company', 'Group').as('a').
  repeat(bothE('doesBusinessIn', 'partOf').otherV()).
    emit().
    until(loops().is(4).or().cyclicPath()).
  filter(simplePath().or().where(eq('a'))).
  path().
  dedup().
    by(unfold().order().by(id).fold())

Below is the output of the 2 queries (ignore the extra map() step, it's just there to improve the output's readability).

gremlin> g.V().hasLabel('Location', 'Company', 'Group').
......1>   repeat(bothE('doesBusinessIn', 'partOf').otherV().simplePath()).
......2>     emit().times(2).
......3>   path().
......4>   dedup().
......5>     by(unfold().order().by(id).fold()).
......6>   map(unfold().coalesce(values('text'), label()).fold())
==>[BNP Paribas,doesBusinessIn,United States]
==>[BNP Paribas,partOf,France]
==>[BNP Paribas,partOf,JP Morgan Chase]
==>[BNP Paribas,doesBusinessIn,United States,doesBusinessIn,JP Morgan Chase]
==>[BNP Paribas,partOf,JP Morgan Chase,doesBusinessIn,United States]
==>[BNP PARIBAS,partOf,Europe]
==>[JP Morgan Chase,doesBusinessIn,United States]
==>[JP Morgan Chase,partOf,BNP Paribas,doesBusinessIn,United States]
==>[JP Morgan Chase,partOf,BNP Paribas,partOf,France]
==>[France,partOf,BNP Paribas,doesBusinessIn,United States]

gremlin> g.V().hasLabel('Location', 'Company', 'Group').as('a').
......1>   repeat(bothE('doesBusinessIn', 'partOf').otherV()).
......2>     emit().
......3>     until(loops().is(4).or().cyclicPath()).
......4>   filter(simplePath().or().where(eq('a'))).
......5>   path().
......6>   dedup().
......7>     by(unfold().order().by(id).fold()).
......8>   map(unfold().coalesce(values('text'), label()).fold())
==>[BNP Paribas,doesBusinessIn,United States]
==>[BNP Paribas,partOf,France]
==>[BNP Paribas,partOf,JP Morgan Chase]
==>[BNP Paribas,doesBusinessIn,United States,doesBusinessIn,JP Morgan Chase]
==>[BNP Paribas,doesBusinessIn,United States,doesBusinessIn,BNP Paribas]
==>[BNP Paribas,partOf,France,partOf,BNP Paribas]
==>[BNP Paribas,partOf,JP Morgan Chase,doesBusinessIn,United States]
==>[BNP Paribas,partOf,JP Morgan Chase,partOf,BNP Paribas]
==>[BNP Paribas,doesBusinessIn,United States,doesBusinessIn,JP Morgan Chase,partOf,BNP Paribas]
==>[BNP PARIBAS,partOf,Europe]
==>[BNP PARIBAS,partOf,Europe,partOf,BNP PARIBAS]
==>[JP Morgan Chase,doesBusinessIn,United States]
==>[JP Morgan Chase,doesBusinessIn,United States,doesBusinessIn,JP Morgan Chase]
==>[JP Morgan Chase,partOf,BNP Paribas,doesBusinessIn,United States]
==>[JP Morgan Chase,partOf,BNP Paribas,partOf,France]
==>[JP Morgan Chase,partOf,BNP Paribas,partOf,JP Morgan Chase]
==>[JP Morgan Chase,doesBusinessIn,United States,doesBusinessIn,BNP Paribas,partOf,France]
==>[JP Morgan Chase,doesBusinessIn,United States,doesBusinessIn,BNP Paribas,partOf,JP Morgan Chase]
==>[France,partOf,BNP Paribas,doesBusinessIn,United States]
==>[France,partOf,BNP Paribas,partOf,France]
==>[France,partOf,BNP Paribas,partOf,JP Morgan Chase,doesBusinessIn,United States]
==>[United States,doesBusinessIn,JP Morgan Chase,doesBusinessIn,United States]
==>[United States,doesBusinessIn,BNP Paribas,doesBusinessIn,United States]
==>[United States,doesBusinessIn,JP Morgan Chase,partOf,BNP Paribas,doesBusinessIn,United States]
==>[Europe,partOf,BNP PARIBAS,partOf,Europe]

UPDATE (based on latest comments)

Since JanusGraph has non-comparable edge identifiers, you'll need a unique comparable property on all edges. This can be as simple as a random UUID.

This is how I updated your sample graph:

g.addV('Group').property('text', 'BNP Paribas').as('a').
  addV('Group').property('text', 'BNP PARIBAS').as('b').
  addV('Company').property('text', 'JP Morgan Chase').as('c').
  addV('Location').property('text', 'France').as('d').
  addV('Location').property('text', 'United States').as('e').
  addV('Location').property('text', 'Europe').as('f').
  addE('partOf').from('c').to('a').
    property('uuid', UUID.randomUUID().toString()).
  addE('doesBusinessIn').from('c').to('e').
    property('uuid', UUID.randomUUID().toString()).
  addE('doesBusinessIn').from('a').to('e').
    property('uuid', UUID.randomUUID().toString()).
  addE('partOf').from('a').to('d').
    property('uuid', UUID.randomUUID().toString()).
  addE('partOf').from('b').to('f').
    property('uuid', UUID.randomUUID().toString()).
  iterate()

Now, that we have properties that can uniquely identify an edge, we also need unique properties (of the same data type) on all vertices. Luckily the existing text properties seem to be good enough for that (otherwise it would be the same story as with the edges - just add a random UUID). The updated queries now look like this:

g.V().hasLabel('Location', 'Company', 'Group').
  repeat(bothE('doesBusinessIn', 'partOf').otherV().simplePath()).
    emit().times(2).
  path().
  dedup().
    by(unfold().values('text','uuid').order().fold())

g.V().hasLabel('Location', 'Company', 'Group').as('a').
  repeat(bothE('doesBusinessIn', 'partOf').otherV()).
    emit().
    until(loops().is(4).or().cyclicPath()).
  filter(simplePath().or().where(eq('a'))).
  path().
  dedup().
    by(unfold().values('text','uuid').order().fold())

The result are, of course, the same as above.

Daniel Kuppitz
  • 10,846
  • 1
  • 25
  • 34
  • thanks so much for this. It's perfect... However it seems not to work the same way as your graph. I edited my question. – Baptiste Arnaud Jul 24 '18 at 09:43
  • Having the `dedup()` inside `repeat()` has a whole different meaning. So, anyways, if your ids are not comparable (which is only true for edge ids I guess), you should just use another unique property. Since your edges currently don't have any properties, I would add a `uuid` property and populate it with `UUID.randomUUID()`. I'll add update my answer with an example soon. – Daniel Kuppitz Jul 24 '18 at 16:48