0

I have 2 heavy graphml files (which is why I don't want to combine them if not absolutely necessary). Additionally, the nodes ids are coherent between the two files, and there is no reference to any node from the second file in the first one.

Would there be a way to load the first file into JanusGraph, and then load the second as an addition to the first? (If it needs a little reformatting, it is not an issue, I can process the files as I want.)

If it isn't possible that way, how can I load big amounts of data into JanusGraph?

GregoirePelegrin
  • 1,206
  • 2
  • 7
  • 23
  • 1
    I was not quite clear from the question as to if the second file will need to create edges connected to nodes that were created during the load of the first file? GraphML files are not really designed, as I understand it, to be additive. They are all essentially self contained graphs. – Kelvin Lawrence Apr 18 '22 at 14:50
  • First of all, thanks for your practical guide to Gremlin, it was my main source up until this point! When it comes to my problem, yes, they would have to be able to create links with already-existing nodes. The thing is that my graph is really big and handling massive files wouldn't be as effective as handling multiple smaller ones. So it's effectively a single graph spread across multiple files. The main advantage I thought I had was that the nodes of the first file weren't dependent on the second one's. As I don't know how the loading process works in detail, I'd hoped it was possible? – GregoirePelegrin Apr 19 '22 at 07:00
  • 1
    I don't think you can merge into an existing graph using GraphML. You will likely need to come up with a more "bulk load" type approach where you load data in parallel. The challenge with JanusGraph is knowing what ID was given to what. This problem has been discussed a lot in the past though. A few searches on JanusGraph bulk loading might find a good hit for you. When I have a free moment I'll try and dig up some examples I have seen in the past. – Kelvin Lawrence Apr 20 '22 at 13:30
  • 1
    Thanks a lot for your time! I have to transfer on Neo4J now instead of JanusGraph, so I am not in real need of the help anymore (though I will look the bulk loading up). I'll let the question open in case there is a solution. – GregoirePelegrin Apr 20 '22 at 13:37
  • 1
    Very curious how large your file is and whether Neo4j is able to handle the data if you are using the community edition – venegr Apr 21 '22 at 07:55
  • For now I am using an extract of this file to be able to use it with the community edition. The file is (minimum) 1Gb. It may have been possible to load it into JanusGraph still. But the processing pipeline is slowed by this size, thus my desire to split it into smaller files. – GregoirePelegrin Apr 21 '22 at 08:42

1 Answers1

0

It doesn't seem as though there is a way to load multiple graphml files into JanusGraph. This being said, one can use personalized groovy scripts to load data from csv, txt, ... files.
This is easier and allows to handle large amount of data, split into smaller files. (One way to proceed would be to do one file per type of node / type of relationship. This makes the process relatively easy)

GregoirePelegrin
  • 1,206
  • 2
  • 7
  • 23