0

TL:DR:

I need to find the most efficient Cypher query that would get the nodes connected to a certain node type with a certain type of relation and to then retrieve the connections between those nodes, filter out the top 150 most connected ones, and show them to the user.

I propose one below using APOC relationships property query, but I think it can be made more efficient, so I'm looking for your advice.

LONG EXPLANATION:

In my datamodel I have the nodes of the type:

:Concept :Context :User :Statement

This is used for text network analysis, so the basic idea is that the :Concepts appear in :Statements that belong to a certain :Context added by a certain :User.

They also have properties, such as uid (the unique ID), and name (the name).

Every :Concept is connected to every other :Concept with the :TO type of directed relation.

If a :Concept belongs to a :Context it has the :AT relation to that :Context

If a :Concept is made by a :User it is connected to that user with the :BY type of relation.

I also added properties to relations, so that they show which user made the :TO connection and in which context they appeared.

I need to get a list of nodes and their relationships in a certain context, so I currently use the Cypher / APOC query of the type:

CALL apoc.index.relationships('TO','user:15229100-b20e-11e3-80d3-6150cb20a1b9') 
YIELD rel, start, end 
WITH DISTINCT rel, start, end 
MATCH (ctx:Context) 
WHERE rel.context = ctx.uid 
AND (ctx.name="decon" ) 
RETURN DISTINCT start.uid AS source_id, 
start.name AS source_name, 
end.uid AS target_id, 
end.name AS target_name, 
rel.uid AS edge_id, 
ctx.name AS context_name, 
rel.statement AS statement_id, 
rel.weight AS weight 

It works pretty well, however, the problem is that if the graph is large (e.g. more than 1000 nodes and 5000 connections) it takes too long to query it.

So I want to be able to filter the number of relations I get.

Using the request above it's quite difficult to do so, as I want to filter out the top 150 most connected nodes and I need to get the data first in order to do that.

So I thought that maybe I should change the logic of my request and instead:

1) Query the :Context I'm interested in;

2) Get all the :Concept nodes connected to it;

3) Find all the relations of the retrieved :Concept nodes to one another;

4) Get the top X (150) most connected :Concept nodes, disregard the rest.

5) Show them to the user.

I tried the following query:

MATCH (ctx:Context{name:'decon',by:'15229100-b20e-11e3-80d3-6150cb20a1b9'}) 
WITH ctx MATCH (c1:Concept)-[:AT]->(ctx),
(c2:Concept)-[:AT]->(ctx) 
WITH c1, c2 
MATCH (c1)-[rel:TO]->(c2) 
RETURN DISTINCT rel;

But it seems to be taking much much longer.

I also need to filter out the relations between those nodes, so that they only show the relations made by a certain :User and only appearing in certain :Statement.

Anyone has an idea what else I could try?

PS The source-code is in https://github.com/noduslabs/infranodus/blob/master/lib/entry.js#L573

Aerodynamika
  • 7,883
  • 16
  • 78
  • 137

1 Answers1

2

You're generating a cartesian product of those :Concept nodes which is slowing down your query.

You could try this instead:

MATCH (c:Concept)-[:AT]->(:Context{name:'decon',by:'15229100-b20e-11e3-80d3-6150cb20a1b9'}) 
WHERE (c)-[:BY]->(:User {uid:'15229100-b20e-11e3-80d3-6150cb20a1b9'})
// AND <additional predicate for desired :Statement>
WITH collect(c) as concepts
UNWIND concepts as c
WITH c, size([(c)-[:TO]->(c2) WHERE c2 in concepts | c2]) as connections
ORDER BY connections DESC
LIMIT 150
RETURN c

You'll of course want an index on :Context(by) for the initial match to be quick.

InverseFalcon
  • 29,576
  • 4
  • 38
  • 51
  • thank you! This query seems to be good, although it's still takes a bit longer than the one above (I have the index on), but it's in the right direction. I wanted to ask you: what does it mean this part here: ` WHERE c2 in concepts | c2` the vertical slash? Thanks! – Aerodynamika Jan 13 '19 at 18:35
  • In that line we're using [pattern comprehension](https://neo4j.com/docs/cypher-manual/current/syntax/lists/#cypher-pattern-comprehension) to match a pattern and extract elements of that pattern to a list. The vertical slash here is a separator between the matched pattern + WHERE clause and the expression to use from the pattern used to populate the elements of the list. (in this query, we only generate this list so we can get the size of it). – InverseFalcon Jan 15 '19 at 21:18