I'm currently exploring graph database potential for some processes in my industry. I've started with Neo4Jclient one week ago so I'm below the standard beginner :-)
I'm very excited about Neo4J but I'm facing huge performances issues and I need help.
The first step in my project is be to populate Neo4j from existing text files. Those files are composed of lines formatted using a simple pattern:
StringID=StringLabel(String1,String2,...,StringN);
For exemple, if I consider following line:
#126=TYPE1(#80,#125);
I would like to create one node with label "TYPE1", and 2 properties: 1) a unique ID using ObjectID: "#126" in above example 2) a string containing all parameters for future use: "#80,#125" in above example
I must consider that I will deal with multiple forward references, as in the exemple below:
#153=TYPE22('0BTBFw6f90Nfh9rP1dl_3P',#144,#6289,$);
The line defining the node with StringID "#6289" will be parsed later in the file.
So, to solve my file import problem, I've defined the following class:
public class myEntity
{
public string propID { get; set; }
public string propATTR { get; set; }
public myEntity()
{
}
}
And thanks to forward references in my text file (and with no doubt my poor Neo4J knowledge...) I've decided to work in 3 steps:
First loop, I extract, from each line parsed from my file, strLABEL, strID and strATTRIBUTES, then I add one Neo4j node for each line using following code:
strLabel = "(entity:" + strLABEL + " { propID: {newEntity}.propID })";
graphClient.Cypher
.Merge(strLabel)
.OnCreate()
.Set("entity = {newEntity}")
.WithParams(new {
newEntity = new {
propID = strID,
propATTR = strATTRIBUTES
}
})
.ExecuteWithoutResults();
Then I match all nodes created in Neo4J using following code:
var queryNode = graphClient.Cypher
.Match("(nodes)")
.Return(nodes => new {
NodeEntity = nodes.As<myEntity>(),
Labels = nodes.Labels()
}
);
And finally I loop on all nodes, split the propATTR properties for each node and add one relation for each ObjectID found in propATTR using following code:
graphClient.Cypher
.Match("(myEnt1)", "(myEnt2)")
.Where((myEntity myEnt1) => myEnt1.propID == strID)
.AndWhere((myEntity myEnt2) => myEnt2.propID == matchAttr)
.CreateUnique("myEnt1-[:INTOUCHWITH]->myEnt2")
.ExecuteWithoutResults();
When I explore the database populated using that code using Cypher, the resulting nodes and relations are the right ones and Neo4J execution speed is very fast any queries I've tested. It's very impressive and I'm convinced there is a huge potentiel for Neo4j in my industry.
But my big issue today is time required to populate the database (my config: win8 x64, 32Go RAM, SSD, intel core i7-3840QM 2.8GHz):
For a small test case (6400 lines) it's took me 13s to create 6373 nodes, and 94s more to create 7800 relations
On a real test case (40000 lines) it's took me 496s to create 38898 nodes, and 3701s more to create 89532 relations (yes: more than one hour !)
I've no doubt such poor performances are directly resulting from my poor neo4jclient knowledge.
It would be a tremendous help for me if the community can advise me on how to solve that bottleneck.
Thanks by advance for your help.
Best regards Max