0

driverforneo4jforpython. I have a program that dynamically creates around 10-12 queries . The final result from all queries is collected in alist` and returned.

Below are 10 such queries:

MATCH  (sslc:subSubLocality)-[:CHILD_OF]->(v4)-[:CHILD_OF]->(v3)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(st:state)  WHERE (st.name_wr = 'abcState') AND (sslc.name_wr= 'xyzSLC' OR sslc.name_wr= 'abcxyzcolony')  RETURN st, sslc, v4, v3, v2

MATCH  (slc:subLocality)-[:CHILD_OF]->(v3)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(st:state)  WHERE (st.name_wr = 'abcState') AND (slc.name_wr= 'xyzSLC' OR slc.name_wr= 'abcxyzcolony')  RETURN st, slc, v3, v2

MATCH  (loc:locality)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(st:state)  WHERE (st.name_wr = 'abcState') AND (loc.name_wr= 'deltax' OR loc.name_wr= 'xyzSLC' OR loc.name_wr= 'abcxyzcolony')  RETURN st, loc, v2

MATCH  (ct:city)-[:CHILD_OF]->(st:state)  WHERE (st.name_wr = 'abcState') AND (ct.name_wr= 'deltax' OR ct.name_wr= 'abcxyz')  RETURN st, ct

MATCH  (sslc:subSubLocality)-[:CHILD_OF]->(v3)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(ct:city)  WHERE (ct.name_wr = 'deltax' OR ct.name_wr = 'abcxyz') AND (sslc.name_wr= 'xyzSLC' OR sslc.name_wr= 'abcxyzcolony')  RETURN ct, sslc, v3, v2

MATCH  (slc:subLocality)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(ct:city)  WHERE (ct.name_wr = 'deltax' OR ct.name_wr = 'abcxyz') AND (slc.name_wr= 'xyzSLC' OR slc.name_wr= 'abcxyzcolony')  RETURN ct, slc, v2

MATCH  (loc:locality)-[:CHILD_OF]->(ct:city)  WHERE (ct.name_wr = 'deltax' OR ct.name_wr = 'abcxyz') AND (loc.name_wr= 'deltax' OR loc.name_wr= 'xyzSLC' OR loc.name_wr= 'abcxyzcolony')  RETURN ct, loc

MATCH  (sslc:subSubLocality)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(loc:locality)  WHERE (loc.name_wr = 'deltax' OR loc.name_wr = 'xyzSLC' OR loc.name_wr = 'abcxyzcolony') AND (sslc.name_wr= 'xyzSLC' OR sslc.name_wr= 'abcxyzcolony')  RETURN loc, sslc, v2

MATCH  (slc:subLocality)-[:CHILD_OF]->(loc:locality)  WHERE (loc.name_wr = 'deltax' OR loc.name_wr = 'xyzSLC' OR loc.name_wr = 'abcxyzcolony') AND (slc.name_wr= 'xyzSLC' OR slc.name_wr= 'abcxyzcolony')  RETURN loc, slc

MATCH  (sslc:subSubLocality)-[:CHILD_OF]->(slc:subLocality)  WHERE (slc.name_wr = 'xyzSLC' OR slc.name_wr = 'abcxyzcolony') AND (sslc.name_wr= 'xyzSLC' OR sslc.name_wr= 'abcxyzcolony')  RETURN slc, sslc

The Queries might change based on the input dictionary (as I mentioned the queries are created at run-time). But the queries share the same structure.

Below is a Query Plan that I get and it remains the same for all queries just differs in values inside.

query 1

Below is my code that fires up these requests:

def get_query_response(query_list: list)-> list:
    driver = GraphDatabase.driver(uri, auth=("neo4j", "neo4j"))
    with driver.session() as session:
        with session.begin_transaction() as tx:
            response = [record.values() for query in query_list for record in tx.run(query)]
            return response

The query_list is a collection of str that has these queries.

The problem is the whole task takes 2 seconds to give a response. Is there any way to optimize the query or make it faster or maybe operate in milliseconds?


Edit:

To answer a few questions:

  • Yes the time mentioned is the time it takes to fetch the results. I had narrowed it down before hand.
  • The queries take roughly from 3ms - 10ms to execute. when I fire the query on neo4j desktop. Is it the driver that's causing the issue?
  • Yes it's a locally hosted neo4j database and my system is i7 16GB Memory with 1TB SSD
  • No I didn't create indeices now I have and I get a performance bump with 500ms but now it's 1.5s is there any way I can push it to work in milliseconds?
iam.Carrot
  • 4,976
  • 2
  • 24
  • 71
  • To get a sense of the overhead for making any request, try changing `query_list` to just contain this simple query: `RETURN 123;`. How long does it take? – cybersam Mar 29 '18 at 17:44
  • @cybersam interesting roll out of the response. The query still takes `1s` to execute. This means my query is great. There is some issue with the driver? – iam.Carrot Mar 30 '18 at 03:58

2 Answers2

1

Add appropriate indexes or uniqueness constraints so that your generated queries do not need to scan for the appropriate nodes to start working.

For example (based on your examples), you could add indexes to:

  • :subSubLocality(name_wr)
  • :subLocality(name_wr)
  • :locality(name_wr)
  • :city(name_wr)
cybersam
  • 63,203
  • 6
  • 53
  • 76
0

I can't say for sure what the cause is, but I have a few questions that should help us get closer to an answer.

• Have you tried benchmarking these queries individually? At first glance, they look like they are simple enough to complete, so I don't think this is the issue but it wouldn't hurt to know if you really need to be optimizing the queries themselves.

• You mentioned it takes "2 seconds", is that from the moment you hit 'enter' to execute your Python script (so things like initiating the connection to the Neo4j instance are included), or does it specifically take 2.0 seconds for the queries to execute?

• The docs note that prior to v3.2 of Neo4j, the Cypher planner wasn't always making the most efficient choices. If you have an earlier version, the docs mention you should default to the cost-based planner.

• Is this a local Neo4j instance? If it's hosted, what are the hardware specs of the host machine? Might not hurt to bump up the specs if possible.

• If you haven't added any custom indexing on properties and your queries always look the same, I would recommend looking into that option.

Stephen Gheysens
  • 293
  • 2
  • 12
  • Also, is your Python code optimized? Is its engagement time in the 2 sec? You might isolate the times in python and on the Neo4j server. – David A Stumpf Mar 29 '18 at 14:27