I have some existing code that I have written in Groovy for data ingestion into Titan w/ Cassandra + Elasticsearch backend. With the release of Datastax Enterprise 5.0, I was looking to see if the existing code for Titan could be migrated over.
The primary use of the code was to parse out some fields,transform some of the values (ex: datetimestamp -> epoch), and checking for edge uniqueness when adding new edges (ex: 'A likes Apples' relation should only appear once in the graph even though multiple 'A likes Apples' relations may appear in the raw file).
What I have tried so far:
Using the DSE Graph Loader with edge label multiplicity as single (no properties) and vertices multiplicity as single:
data = File.text(filepath).delimiter(',').header('a', 'b', 'c') load(data).asVertices { } load(data).asEdges { }
Using this template, vertices are unique (one vertex per vertex label). However, edge labels defined in the schema as single will throw an exception every time the "same" edge is attempted to be added. Is it possible to add checks within the loading script for uniqueness?
Loading data through the gremlin console
:load filepath
I'm finding that my pre-existing code throws quite a few exceptions upon executing the load command. After getting rid of a few Java/Titan classes that weren't importing (TitanManagement, SimpleDateFormat could not be imported), I am getting a
org.apache.tinkerpop.gremlin.groovy.plugin.RemoteException
Any tips on getting gremlin-console integration working?
One last question: Are there any functions that have been removed with the Datastax acquisition of Titan?
Thanks in advance!