My question is a follow up of the question I asked here => [1]. After a long conversation with stephen mallette, he showed me how I can build a graph that will be loaded when I will start the server. My final script is this [2]. What I want to do? Let's say I have:
[
{
"host": "google.com",
"ip": "8.8.8.8",
"random": 25
},
{
"host": "google.com",
"ip": "1.2.3.4",
"random": 10
}
]
There will be a vertex with a property "host" with the value google.com (#1). There will be a vertex with the property "ip" and the value 8.8.8.8 (#2) and another one with the property "random" and the value 25 (#3). Also, I will create 3 edges. host #1 -> ip #2, host #1 -> random #3 and ip #2 -> random #3. For the other object, I won't create another google.com vertex, because it already exists, but I will create the ip #4 and the random #5 one. I will create a host #1 -> ip #4 edge, host #1 -> random #5 and ip #4 -> random #5. So for an object O with k fields, there will be k possibly new vertices and k * (k - 1) / 2 edges.
My question is...can my code be improved? I tried to work with a JSON with 10k objects, each with 7 fields, but it kind of takes time. How can I achieve this in a faster way? Can't I process batches of data? I heard about indexes, but I don't know what this means or how this can improve everything.
[1] Normal JSON to GraphSON format
[2] https://pastebin.com/g7qnQdq9
Edit: Ok, I hard-coded multiple graph.createIndex(X,Vertex.class) commands, where X = the name of the fields in my JSON. It seems to be faster, yes. How can I further improve it? What am I doing wrong and how can I actually do it better? Should I try to generate a JSON in the format gremlin exports a graph, instead of doing this? I think it is extremely hard to achive that format. I can't find proper documentation and I'm desperate to find an answer, since this is a job related problem.
Edit 2: By the way, I just tried this => https://pastebin.com/Uts4KQCH script with a 50k objects JSON and around the 38k, it kind of slowed down a lot, like from 1000 in 1.5 seconds, to 1000 in 30 seconds.