2

I'm looking for a performant way of retrieving all connected nodes. However there is a twist. I would like to exclude nodes and consequent children, that are connected via certain relationship types.

The attached figure illustrates my case.

There are two or more clusters of nodes. I would like to retrieve all nodes of a single cluster, depending on the id inside the query. All other nodes (coming from different clusters) and connected via "LINK..." relations shall not be included.

I know how to retrieve all connected nodes via:

MATCH (n:MyNode {id : 123})-[*]-(connectedNodes) RETURN connectedNodes

Filtering with the WHERE clause sounds like a bad idea, because it would still fetch the whole graph. Is there maybe something inside the APOC procedures, that would allow me to do something in that manner? Thanks a lot already for your help.

EDIT 1: sofar I tried the first suggestion given in the comments but the execution time was not sufficient. I will try to restrict relationahip and node types afterall. Also I tried a custom implementation inside Python using a recursive function. Not finalized yet though.

EDIT 2: @InverseFalcon's suggestion worked liked a charm. First filter all available relationship types for the once that shall not be considered and then applying the apoc.path.subgraphNodes procedure with the respective starting node and the valid relationship types. Thank you. enter image description here

Daniel
  • 107
  • 1
  • 13
  • Does [Expand graph](https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_expand_paths) from the APOC library helps you? ("Expand from start node following the given relationships from min to max-level adhering to the label filters. [...]") – ThirstForKnowledge Nov 30 '18 at 17:45

2 Answers2

3

Tezra's answer has some good points, and you'll want to return DISTINCT connectedNodes otherwise you'll get duplicates, but on a highly connected graph this may take awhile (or even hang) depending on the number of nodes, since Cypher is interested in all possible paths for matches, and that can quickly get out of control.

For APOC we can handle this case, but as Tezra remarked we don't have a way to blacklist relationships, and even if we had that, we don't have a way to blacklist based on partial names of the relationship types.

The approach you would need to use is to get all relationship types first then remove any which start with LINK, then join the list of remaining relationships into an | separated string. Then you could pass that to the relationship filter.

CALL db.relationshipTypes() YIELD relationshipType
WHERE NOT relationshipType STARTS WITH 'LINK'
WITH collect(relationshipType) as relTypes
WITH apoc.text.join(relTypes, '|') as relTypesString
MATCH (n:MyNode {id : 123})
CALL apoc.path.subgraphNodes(n, {relationshipFilter:relTypesString}) YIELD node
RETURN node as connectedNode
InverseFalcon
  • 29,576
  • 4
  • 38
  • 51
  • Thank you alot for your answer @InverseFalcon. I haven't had the opportunity to try your method but I might need to restrict the relationship labels afterall. So far I only tried Tezra's suggestion and it was still running after 20 minutes. My data set is a bit unreliable but I might be able to find a srlet of reliable relationship and node types to filter for. Also I tried a custom implementation in the Form of a python script and using py2neo, do you think I could rival anything that is possible through Cypher and APOC? – Daniel Dec 01 '18 at 16:52
  • For getting distinct connected nodes at a deep or unlimited distance in a highly connected graph you need APOC (or a similar custom procedure that uses a different traversal approach than Cypher). The path expander procs will help out when using specific rel types (just change up what you pass to the relatioshipFilter) and for node labels you can use the `labelFilter` config property to whitelist, blacklist, or enforce that you want only nodes of specific label types at the end. – InverseFalcon Dec 01 '18 at 20:15
  • Again thanks for your answer and explanations. Your solution worked perfectly, even without further restricting the relationship types. I tried to retrieve the nodes once more with pure Cypher and using DISTRINCT, but once I reached the fifth level of sub relations it was already stalling. I just started using Neo4j recently and didn't make any use of APOC so far; definitely need to change that! – Daniel Dec 02 '18 at 20:35
1

First, I want to stress that Cypher does not restrict how information is retrieved, it only determines what is returned. So try using WHERE before ruling it out (Also, try upgrading to the latest Neo4j for the smartest cypher planner). This should work just fine because the cypher planner can filter the results while it matches them.

MATCH (n:MyNode {id : 123})-[rs*]-(connectedNodes)
WHERE NONE(r in rs WHERE TYPE(r)="LINK")
RETURN DISTINCT connectedNodes

The APOC procedures I can think of require you to name the relationships used (you can black list labels, but doesn't seem to apply to relation types), so would be the same as -[rs:A|B|C|D*]-

Tezra
  • 8,463
  • 3
  • 31
  • 68
  • Thanks for the answer @tezra. I just tried your code but my clusters seem to be too big, at least the query still did not terminate after 20 minutes. Do you mean 3.5 with newest version? I'm currently using 3.4.10. I'm thinking of restricting nodes and relationships further, but it's not Trivial with my data set. Also I wrote my own script that includes a recursive function to traverse the different nodes. ( python language and py2neo library) not sure if that method could compete with Cypher intern methods. – Daniel Dec 01 '18 at 16:32
  • @Daniel I made a slight edit to the query, forgot to include DISTINCT so that cypher can do a pruning search instead of an exhaustive search. Depending on how connected your data is, that should significantly boost the cypher speed (and cut out redundant node returns). The Cypher planner gets a little smarter with each release, but most of the good shortcut-plans usually come on major releases. Upgrading should help, but you aren't too far behind latest so I don't think it will be too noticeable. – Tezra Dec 01 '18 at 20:48
  • Also, change `[rs*]` to `[rs**..10]` and play with that number a bit. It won't be the whole cluster, but it should give you an idea of how deeply you can efficiently search your cluster with just cypher. – Tezra Dec 01 '18 at 20:49