0

I want to calculate betweenness in a very large graph in neo4j using py2neo.

I am using a cypher query like this:

MATCH p=allShortestPaths((source:DOLPHIN)-[*]-(target:DOLPHIN)) 
WHERE id(source) < id(target) 
AND length(p) > 1 
UNWIND nodes(p)[1..-1] as n 
RETURN n.name, count(*) as betweenness 
ORDER BY betweenness DESC

It is working for small graph but not working for a large graph with 1 million nodes. I have passed this query using py2neo.

Earlier I was getting error timeout which have resolved but now after running for sometime it is saying query cannot be processed. I am getting following error:-

    File "/usr/local/lib/python2.7/dist-packages/py2neo/cypher/core.py", line 111, in execute
    results = tx.commit()
    File "/usr/local/lib/python2.7/dist-packages/py2neo/cypher/core.py", line 306, in commit
    return self.post(self.__commit or self.__begin_commit)
    File "/usr/local/lib/python2.7/dist-packages/py2neo/cypher/core.py", line 261, in post
    raise self.error_class.hydrate(error)
    py2neo.cypher.error.statement.ExecutionFailure: The statement has been closed.

I have searched a lot about it. Please help me with this

Mohit Mangal
  • 89
  • 1
  • 5
  • It's possible the database server is being overwhelmed by the request. Try and run your process in series, limiting the range of nodes traversed at each turn. I am hesitant to suggest a solution, cause I'm not really sure what you're aggregating. Either limiting nodes, or paths would probably yield different results on each call. But I'd advise you to not do full graph queries in Neo4j. – Guilherme Apr 16 '15 at 09:25
  • Hi thank you for your suggestion. I have data of authors of research papers which are of same domain. I made graph of co-authors from that. Now i want to find to top authors in network by calculating betweenness of each author. can you suggest any good way for this using some neo4j queries because this query is not working? – Mohit Mangal Apr 17 '15 at 11:02
  • For a good recommendation, I would need to know your schema. Assuming you have something like: (author)-[:WROTE]->(paper) If you define top authors as those that have written most papers, then you could run a query like this: MATCH (author :Author)-[r :WROTE]-(x :Paper) RETURN author.name AS name, COUNT(r) AS n ORDER BY n DESC LIMIT 10 – Guilherme Apr 17 '15 at 16:09
  • yes I have taken the same schema. To calculate top authors I am not counting number of publications. To calculate top authors I am finding the number of shortest paths (from each node to each other node) in which that specific authors occurs in that Graph. Higher the number of shortest paths(from each node to each other node) in which that author occurs better is its rank. Please see the query I have written. It is doing the same. – Mohit Mangal Apr 21 '15 at 05:41

1 Answers1

0

I can't comment on the algorithm/approach you use to rank the authors. Ultimately tough, the query you're running is a full graph search, with some aggregation. Neo4j was not designed for such cases. As your data increases, it will be harder to run the query.

Ideally, a query should only traverse a small section of the graph. So for your case, instead of asking who is the most popular, you could ask what the rank is for each author, on each query. Doing this for all of them, one at a time, and ranking them yourself might work better here. Unless you take a different approach, like limit the range of neighbour nodes to traverse, or the length of the longest path, or even both. But I suspect it would affect your result.

I would advise you to re-look at your domain model, based on your needs, and figure out a design model that can help you easily answer your questions, like who is the most popular author, based on your calculation approach. And double check to make sure you're using Indexes, just in case.

Modelling with neo4j:

Sometimes the simplest model doesn't help us answer certain questions; I've had to remodel a few times myself, and turn relationships into nodes for temporal data sorting, cause it wasn't obvious the first time around. Anyways, I hope you figure out a solution.

Cheers

Guilherme
  • 721
  • 6
  • 13