0

I have the following request:

CALL apoc.index.relationships('TO','context:34b4a5b0-0dfa-11e9-98ed-7761a512a9c0') 
 YIELD rel, start, end WITH DISTINCT rel, start, end 
 RETURN DISTINCT start.uid AS source_id, 
 start.name AS source_name, 
 end.uid AS target_id, 
 end.name AS target_name, 
 rel.uid AS edge_id, 
 rel.context AS context_id, 
 rel.statement AS statement_id, 
  rel.weight AS weight

Which returns a table of results such as enter image description here

The question: Is there a way to filter out the top 150 most connected nodes (source_name/source_id and target_name/edge_id nodes)?

I don't think it would work with frequency as each table row is unique (because of the different edge_id) but maybe there's a function inside Neo4J / Cypher that allows me to count the top most frequent (source_name/source_id and target_name/edge_id) nodes?

Thank you!

Aerodynamika
  • 7,883
  • 16
  • 78
  • 137

2 Answers2

1

You could always use size( (node)-[:REL]->() ) to get the degree.

And if you compute the top-n degree's first you can filter those out by comparing

WHERE min < size( (node)-[:REL]->() ) < max

Michael Hunger
  • 41,339
  • 3
  • 57
  • 80
  • Thank you, Michael. But how do I integrate it into my query? Can I add `WHERE size(rel)` right after the `WITH DISTINCT` part of the query? And I'm still not clear how I filter out the top 150 ones... Will be great if you could clarify that! – Aerodynamika Jan 14 '19 at 10:54
1

This query might do what you want:

CALL apoc.index.relationships('TO','context:34b4a5b0-0dfa-11e9-98ed-7761a512a9c0') 
YIELD rel, start, end
WITH start, end, COLLECT(rel) AS rs
ORDER BY SIZE(rs) DESC LIMIT 50
RETURN
  start.uid AS source_id, 
  start.name AS source_name, 
  end.uid AS target_id, 
  end.name AS target_name,
  [r IN rs | {edge_id: r.uid, context_id: r.context, statement_id: r.statement, weight: r.weight}] AS rels

The query uses the aggregating function COLLECT to collect all the relationships for each pair of start/end nodes, keeps the data for the 50 node pairs with the most relationships, and returns a row of data for each pair (with the data for the relationships in a rels list).

cybersam
  • 63,203
  • 6
  • 53
  • 76
  • Thank you, it works! Do I understand correctly that it counts how `rels` have the most similar `start` and `end` node pairs and then filters out the top 50 of them? Can I somehow integrate the `weight` parameter of the relationship `rel.weight` into this calculation? – Aerodynamika Jan 25 '19 at 21:24
  • The only problem is that it filters out the top 150 relationships, so that's not necessarily 150 nodes and my graph is directed... I wonder how it could be possible to filter out the top 150 nodes with the highest degree from those results and then show the relationships for them... – Aerodynamika Jan 25 '19 at 21:34