11

I am using Neo4j CE 3.1.1 and I have a relationship WRITES between authors and books. I want to find the N (say N=10 for example) books with the largest number of authors. Following some examples I found, I came up with the query:

MATCH (a)-[r:WRITES]->(b)
RETURN r,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10

When I execute this query in the Neo4j browser I get 10 books, but these do not look like the ones written by most authors, as they show only a few WRITES relationships to authors. If I change the query to

MATCH (a)-[r:WRITES]->(b)
RETURN b,
COUNT(r) ORDER BY COUNT(r) DESC LIMIT 10

Then I get the 10 books with the most authors, but I don't see their relationship to authors. To do so, I have to write additional queries explicitly stating the name of a book I found in the previous query:

MATCH ()-[r:WRITES]->(b)
WHERE b.title="Title of a book with many authors"
RETURN r

What am I doing wrong? Why isn't the first query working as expected?

st1led
  • 385
  • 2
  • 4
  • 18

2 Answers2

20

Aggregations only have context based on the non-aggregation columns, and with your match, a unique relationship will only occur once in your results.

So your first query is asking for each relationship on a row, and the count of that particular relationship, which is 1.

You might rewrite this in a couple different ways.

One is to collect the authors and order on the size of the author list:

MATCH (a)-[:WRITES]->(b)
RETURN b, COLLECT(a) as authors
ORDER BY SIZE(authors) DESC LIMIT 10

You can always collect the author and its relationship, if the relationship itself is interesting to you.

EDIT

If you happen to have labels on your nodes (you absolutely SHOULD have labels on your nodes), you can try a different approach by matching to all books, getting the size of the incoming :WRITES relationships to each book, ordering and limiting on that, and then performing the match to the authors:

MATCH (b:Book)
WITH b, SIZE(()-[:WRITES]->(b)) as authorCnt
ORDER BY authorCnt DESC LIMIT 10
MATCH (a)-[:WRITES]->(b)
RETURN b, a

You can collect on the authors and/or return the relationship as well, depending on what you need from the output.

InverseFalcon
  • 29,576
  • 4
  • 38
  • 51
  • Sure thing! Though you may want to profile the queries first, the first one I gave probably won't be as performant, as it does a ton of collects on a larger graph. – InverseFalcon Feb 15 '17 at 11:04
3

You are very close: after sorting, it is necessary to rediscover the authors. For example:

MATCH (a:Author)-[r:WRITES]->(b:Book)
WITH b, 
     COUNT(r) AS authorsCount
     ORDER BY authorsCount DESC LIMIT 10
MATCH (b)<-[:WRITES]-(a:Author)
RETURN b, 
       COLLECT(a) AS authors
       ORDER BY size(authors) DESC
stdob--
  • 28,222
  • 5
  • 58
  • 73
  • I tried this query, and it also works. I do not have big data, but somehow I feel it is less scalable than the first one in the accepted answer (only a single MATCH clause versus 2). However, I'm not an expert on neo4j inner workings/optimizations, and this also does the job fine. – st1led Feb 15 '17 at 07:22
  • @st1led You can always PROFILE the queries to make sure. That said, the second MATCH is after the LIMIT 10, so it's not nearly as heavy as the first, so the performances should actually all be similar. – InverseFalcon Feb 15 '17 at 11:03