0

I have a graph where nodes can either be 'resources' or 'external dependencies'.

A resource (a.k.a. microservice) may have the following relationships:

  1. resource - DEPENDS_ON -> externalDependency (maxDepth of 1, one direction)
  2. resource - CONNECTS_TO - resource (any depth, any direction)

I'm currently searching for all resources and their relationships (either in or out) with the following query:

Match (Resource)-[:CONNECTIONS*0..]-(ResourceDependency)-[:DEPENDS_ON*0..]-(ExternalDependency) 
Where Resource.name =~ '.*service_name.*' 
Return Resource, ResourceDependency, ExternalDependency

Since resources can depend on each other they may form a circular relationship. When this happens and one of the nodes that belongs to the circle matches the "name" criteria, the query never finishes and neo4j browser eventually freezes.

If I try to lower the CONNECTIONS depth/maxHops to eight (*0..8) it works perfectly. Unfortunately I already have relationships greater than that so this is not a viable solution (they just don't form any circular rel.).

UPDATES:

Setting the maxHops to any value higher than 8 makes Neo4j browser crash.

Since 'resource' nodes can have N depth relationships with each other (and eventually form a circular reference) the query needs to traverse all the graph getting both in and out relationships of all resource nodes AND their (one depth) external dependencies.

QUESTION:

How can I achieve this "where" clause without performance issues on circular relationships?

InverseFalcon
  • 29,576
  • 4
  • 38
  • 51
leovrf
  • 605
  • 1
  • 6
  • 17
  • 1
    I think you should use a max depth, even if it's 100, it's still less than infinite, that's why your browser freezes I think, because of a recursivity in the circular path. – Supamiu Feb 25 '16 at 08:41
  • Thank you for the comment @Supamiu. If I set the max depth to a value equal or higher than 9 the Neo4j browser crashes. As I said, the graph already have 'depth relationships' greater than that (9) so I wouldn't want to limit queries this way. – leovrf Feb 25 '16 at 13:59

2 Answers2

0

The query below might work better for you. If not, you may want to try putting reasonable upper bounds on one or more of the variable-length paths. You might be able to higher limits than you 've tried before.

MATCH (resource:Resource)-[:CONNECTIONS*0..]->(resourceDependency)
WHERE resource.name =~ '.*service_name.*' AND (resourceDependency)-[:DEPENDS_ON]->()
WITH resourceDependency, COLLECT(resource) AS resources
MATCH (resourceDependency)-[:DEPENDS_ON*]->(externalDependency)
RETURN resourceDependency, resources, COLLECT(externalDependency) AS externalDependencies;

The query:

  1. Assumes that your resource nodes have the Resource label, and uses it. If you do not specify a label, neo4j would have to scan every node in the DB to see if it has a :CONNECTIONS or :DEPENDS_ON relationship. If you specify the label, neo4j can just scan Resource nodes.
  2. Specifies the direction for all relationships, which quickly eliminates maybe half of all the relationship traversals that your query currently does.
  3. Uses [:CONNECTIONS*0..] to get all dependencies, including resource itself.
  4. Includes this test: (resourceDependency)-[:DEPENDS_ON]->() to get only the resourceDependency nodes that have an outgoing :DEPENDS_ON relationship.
  5. Aggregates, for every such resourceDependency, all the resources that depend on it. As a side effect, this ensures that we end up with only distinct resourceDependency nodes after this point. We aggregate in this way because a resourceDependency could be depended on by a large number of resources, and so, in the following steps, we want make sure that we try to find the external dependencies for each resourceDependency once.
  6. Find all the externalDependency nodes depended on by each resourceDependency. Notice that we use [:DEPENDS_ON*], which is equivalent to [:DEPENDS_ON*1..], because I assume that a resourceDependency cannot also be an externalDependency.
  7. Returns each resourceDependency, a collection of the resource nodes that depend on it, and a collection of the externalDependency nodes it depends on.
cybersam
  • 63,203
  • 6
  • 53
  • 76
  • Hello @cybersam, thanks for your answer! Although the query is working for circular relationships it won't return all resources on non-circular ones. Your query gave me some ideas of how far one can go with cypher. I already tried a few alternatives (none succeeded) and will give it more tries as soon as possible. Also added more information to the question to make it clearer (see updates). – leovrf Feb 29 '16 at 14:53
0

Cypher's variable-length pattern matching is looking for all possible paths that match the pattern, and it's not the most efficient approach when you're looking for distinct connected nodes.

We can use path expander procs in APOC Procedures to match to distinct reachable resource dependency nodes then from there match to the possible external dependency.

MATCH (Resource) // you really should be using labels, WHERE CONTAINS, and indexes
WHERE Resource.name =~ '.*service_name.*' 
CALL apoc.path.subgraphNodes(Resource, {relationshipFilter:'CONNECTIONS'}) YIELD node as ResourceDependency
MATCH (ResourceDependency)-[:DEPENDS_ON*0..1]->(ExternalDependency)
RETURN Resource, ResourceDependency, ExternalDependency

Note that you will see duplicated data as the nodes for Resource and ResourceDependency can be swapped. If you want to cut down on this, you can add the following right after the CALL:

...
WITH Resource, ResourceDependency
WHERE id(Resource) <= id(ResourceDependency)
...
InverseFalcon
  • 29,576
  • 4
  • 38
  • 51