1

Hi I will try to keep on track but I've done a lot of research and now I just lost. I could really use some expertise here. Below is the situation:

Preface

This is a follow up question from my question here. The issue there was that my cypher queries were taking 1 second at the minimum to return a response. Even queries like RETURN 123 also took 1 second. Which lead to the conclusion Neo4j Bolt Driver for Python is slower than an actual http call to neo4j.

I can back this up with research from GitHub Issues and this from stackoverflow

The problem statement

Each time my code runs, it generates upto 10 Cypher queries and all those have to be fired and then operations need to be performed based on the results.

The issue is using Bolt the queries take 1 second to execute and with HTTP I am stuck. Since I want to use Query Parameters to make the query faster since now it's not Bolt as each http call now takes 30ms, multiply that by 10 {since I have 10 queries} and you have a very poor performing python API to fetch user relations. '

Where am I stuck

  1. A confirmation that yes, the Bolt driver is slow and that I am not doing anything wrong. Since all the posts I've seen are dated a year back
  2. My query has OR and AND conditions, how can I write those using parameters in neo4j REST Calls.
  3. Is there some other graph database I should look towards?
  4. Is there any way I can fire up to 10 queries and get a response time below 200ms?

Other reasons to think I am missing something:

  1. The legend has it, neo4j is the most popular graph database. How is it possible with such drivers?
  2. Over 1 year of reported issues with BOLT drivers and they still haven't fixed these issues.

Sample Request

curl -X POST \
  http://localhost:7474/db/data/cypher \
  -H 'Authorization: Basic bmVvNGo6Y29kZQ==' \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -d '{
  "query" : "MATCH (ct:city)-[:CHILD_OF]->(st:state) WHERE (st.name_wr = {st}) AND (ct.name_wr= {ct}) RETURN st, ct",
  "params": 
  {
    "st" : "california",
    "ct" : "san francisco"
  }
}'

but what if I want to add a clause that either st should be California OR it can be Alaska AND ct must be san francisco, how do I do that with the parameters in REST

EDIT:

I replicated the script and below is the verdict:

58 transactions, tps 0.97 maxdelay 1.08

The curl sample request is the one that fire from postman. The code that I am using can be found from the linked question (in the preface).

iam.Carrot
  • 4,976
  • 2
  • 24
  • 71

2 Answers2

0

Your sample request is using the deprecated legacy endpoint and its legacy API.

You should instead use the transaction endpoint and its newer API, especially since it supports executing multiple Cypher statements in a single request. Each statement can have its own set of parameters.

cybersam
  • 63,203
  • 6
  • 53
  • 76
  • But what about the `OR` clause? While matching the name. Also, the new transaction endpoint doesn't mention a way to fire up a `MATCH` query they are all CREATE and COMMIT. can you please post a sample? – iam.Carrot Mar 30 '18 at 20:29
  • Each statement can be any legal Cypher statement. So, a single request can execute all 10 of your original Cypher statements. – cybersam Mar 30 '18 at 20:50
  • 1
    Regarding the query itself, if your parameter `st` is a list of states, then you can do this: `MATCH (ct:city)-[:CHILD_OF]->(st:state) WHERE (st.name_wr in {st}) AND (ct.name_wr= {ct}) RETURN st, ct – InverseFalcon Mar 30 '18 at 22:43
  • @InverseFalcon oh yeah I remember that from the docs. Thanks for pointing it out. I just need one last help. Can someone show a sample query with the new transaction endpoint for a MATCH query not a CREATE. Or if I am using the same for both do I have to do a commit for a match as well? – iam.Carrot Mar 31 '18 at 03:31
0

EDIT

Well to be honest the issue was with the IP I was using localhost and resolving the localhost was taking time. As soon as I switched to 127.0.0.1 it started working perfectly fine.

Marking this as the answer as this answer helped to actually benchmark the two approaches that lead to the discovery of the issue in host resolution


I think there must be something wrong with your setup. I've been using the python bolt driver for a while now, and for simple queries, I don't think I've ever seen a 1 second delay. I don't know what you code looks like, or your network delay, but I wrote a quick example to look at the delays I see in my local network (which has very low latency). Using Neo4j 3.2.9 and python driver 1.5.3.)

#!/usr/bin/python
from __future__ import print_function
import sys
import time
from neo4j.v1 import GraphDatabase, basic_auth

ip = '10.10.10.10'
runtime = 60.0

querystr = 'RETURN 123'
runstart = time.time()
maxdelay = 0
cnt = 0
#driver = GraphDatabase.driver("bolt+routing://%s:7687" % ip,
driver = GraphDatabase.driver("bolt://%s:7687" % ip,
                              auth=basic_auth("neo4j", "password"))
while time.time() - runstart < runtime:
    start = time.time()
    session = driver.session(access_mode='READ')
    ret = session.run(querystr)
    session.close()
    result = ret.data()
    cnt += 1
    delay = time.time() - start
    if delay > maxdelay:
        maxdelay = delay
    if delay > 0.1:
        print('Large delay seen cnt %s delay %0.2f' % (cnt, delay))
print('%d transactions, tps %0.2f maxdelay %0.2f' % (cnt, cnt/runtime, maxdelay))

I get the output:

117360 transactions, tps 1956.00 maxdelay 0.06

This means the average read took about half a millisecond, and the max was 60ms.

I would look at network latency and issues with resources on both your client and server side.

iam.Carrot
  • 4,976
  • 2
  • 24
  • 71
nortoon
  • 566
  • 4
  • 8