0

I have some existing code that I have written in Groovy for data ingestion into Titan w/ Cassandra + Elasticsearch backend. With the release of Datastax Enterprise 5.0, I was looking to see if the existing code for Titan could be migrated over.

The primary use of the code was to parse out some fields,transform some of the values (ex: datetimestamp -> epoch), and checking for edge uniqueness when adding new edges (ex: 'A likes Apples' relation should only appear once in the graph even though multiple 'A likes Apples' relations may appear in the raw file).

What I have tried so far:

  • Using the DSE Graph Loader with edge label multiplicity as single (no properties) and vertices multiplicity as single:

    data = File.text(filepath).delimiter(',').header('a', 'b', 'c')
    load(data).asVertices { }
    load(data).asEdges { }
    

    Using this template, vertices are unique (one vertex per vertex label). However, edge labels defined in the schema as single will throw an exception every time the "same" edge is attempted to be added. Is it possible to add checks within the loading script for uniqueness?

  • Loading data through the gremlin console

    :load filepath
    

    I'm finding that my pre-existing code throws quite a few exceptions upon executing the load command. After getting rid of a few Java/Titan classes that weren't importing (TitanManagement, SimpleDateFormat could not be imported), I am getting a

    org.apache.tinkerpop.gremlin.groovy.plugin.RemoteException
    

    Any tips on getting gremlin-console integration working?

One last question: Are there any functions that have been removed with the Datastax acquisition of Titan?

Thanks in advance!

stephen mallette
  • 45,298
  • 5
  • 67
  • 135
Jeanie
  • 1
  • 3

1 Answers1

1

We are looking at a feature enhancement to the Graph Loader to support the duplicate edge check. If your edges are only single cardinality, you can enforce that using cardinality property of an edge .single()

For the second item, are you using the DSE supplied Gremlin Console? Is your console local and your cluster located on another machine? What was the setup of your Titan environment?

For context, DataStax did not purchase Titan. Titan is an open source Graph Database and remains an open source Graph Database. DataStax acquired the Aurelius team, the creators of Titan. The Aurelius team built a new Graph Database that was inspired by Titan and is compliant with TinkerPop. There are feature and implementation detail differences between DSE Graph and Titan which can be found here - http://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/graphTOC.html

One that may interest you is the integration of DSE Search and DSE Graph.

jlacefie
  • 614
  • 3
  • 5
  • re: single cardinality: I have tried this, but it yields uniqueness exceptions being thrown when a "new edge" is attempted to be added. At the end, no edges have been added. Yes, I am using the DSE supplied Gremlin console -- it is currently local. – Jeanie Jul 27 '16 at 21:26