I'm migrating from Titan to Datastax. I have a graph with around 50 million nodes that is composed in Persons, Addresses, Phones, etc
I want to calculate a Person node connections (how many persons have the same phone, addresses, etc).
In Titan I wrote a Hadoop job that go over al the person nodes an the I could write a gremlin script to see how many persons have the same phone for this particular node
So as an input properties I have:
titan.hadoop.input.format=com.thinkaurelius.titan.hadoop.formats.hbase.TitanHBaseInputFormat
titan.hadoop.input.conf.storage.backend=hbase
For query filter I query only the person nodes
titan.hadoop.graph.input.vertex-query-filter=v.query().has('type',Compare.EQUAL,'person')
And to run a script I use
titan.hadoop.output.conf.script-file=scripts/calculate.groovy
this will calculate for every node the number of shared phones connection that the person has.
object.phone_shared= object.as('x').out('person_phones').in('person_phones').except('x').count()
Is there a way to write this kind of scripts in Datastax to go over the persons nodes. I see that Datastax uses Spark analytics to count the nodes for example,
but I didn't found any more documentation on how to run custom scripts using analytics
Thanks