1

I have some old graph data with 1 million nodes and 3 million edges that I'd like to convert into Neo4j.

I'm using Neo4j embedded and my program is roughly like:

for (all node in old graph data):
    node1 = neo4jdb.findNode(node1_id)
    node2 = neo4jdb.findNote(node2_id)
    if (node1 or node2 doesnt exist):
        create new nodes
    if (! relationExistBetween(node1, node2)):
        create new relation between node1 and node2

However, the creation process is super slow. With the exact same logic, the program runs much faster with TinkerGraph.

I'm wondering if there're any tricks to make this faster? Thanks!

peidaqi
  • 673
  • 1
  • 7
  • 18

1 Answers1

0

Figured it out. Profiled the code and found out the bottleneck lies within the findNode operation. That leads me to thinking maybe it's related to indexing.

You have to manually create an index on the property to speed things up, with Neo4j Embedded, that's something like:

var transaction = graphDB.beginTx()
try {
  graphDB.schema()
    .indexFor(nodeLabel).on("node_id")
    .create()
  transaction.success()
} finally {
  transaction.close()
}
peidaqi
  • 673
  • 1
  • 7
  • 18
  • I was about to suggest that if performance is an issue you should consider using RedisGraph, but I guess you already figured it out... – Guy Korland Oct 14 '18 at 10:15
  • Thanks. Coincidentally I was looking into RedisGraph yesterday. It looks good and actually maybe fit our purpose more (we need a in-memory graph for very fast bfs), but the project is a bit new and want to wait till it matures a bit more - it uses opencypher anyway so migrate should be painless. – peidaqi Oct 15 '18 at 14:34