1

I'm migrating from Titan to Datastax. I have a graph with around 50 million nodes that is composed in Persons, Addresses, Phones, etc

I want to calculate a Person node connections (how many persons have the same phone, addresses, etc).

In Titan I wrote a Hadoop job that go over al the person nodes an the I could write a gremlin script to see how many persons have the same phone for this particular node

So as an input properties I have:

titan.hadoop.input.format=com.thinkaurelius.titan.hadoop.formats.hbase.TitanHBaseInputFormat
titan.hadoop.input.conf.storage.backend=hbase

For query filter I query only the person nodes

titan.hadoop.graph.input.vertex-query-filter=v.query().has('type',Compare.EQUAL,'person')

And to run a script I use

titan.hadoop.output.conf.script-file=scripts/calculate.groovy

this will calculate for every node the number of shared phones connection that the person has.

object.phone_shared= object.as('x').out('person_phones').in('person_phones').except('x').count()

Is there a way to write this kind of scripts in Datastax to go over the persons nodes. I see that Datastax uses Spark analytics to count the nodes for example,

https://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/graphAnalytics/northwindDemoGraphSnapshot.html

but I didn't found any more documentation on how to run custom scripts using analytics

Thanks

Misha Brukman
  • 12,938
  • 4
  • 61
  • 78
CristiC
  • 192
  • 1
  • 2
  • 12

1 Answers1

1

The answer happens to be on the page you linked. It seems like it might just be a little easier than you are used to with Titan. The key is on step 8 where you configure the Traversal to use the preconfigured OLAP/Analytics TraversalSource, which is named a (for Analytics).

  1. Alias the traversal to the Northwind analytics OLAP traversal source a. Alias g to the OLAP traversal source for one-off analytic queries:

gremlin> :remote config alias g northwind.a

This basically says.. "When I execute a Traversal on TraversalSource g, I want it to be aliased to northwind.a on the server".

Once you do that, all Traversals of g will be executed using northwind.a and thus the Spark analytics engine.

Bob B
  • 4,484
  • 3
  • 24
  • 32
  • Thanks for the answer. I got that part with the OLAP traversal. I need two things. One is to calculate the shared connection and the other on is to export it into a file. Can you provide me a simple example on how do I calculate the shared connections with analytics. Thanks – CristiC Nov 28 '16 at 07:29
  • I've made this script: g.V().hasLabel('person').has('id',8438957).match( __.as("a").out('has_address').in('has_address').count().as('address_count'), __.as("a").out('has_phone').in('has_phone').count().as('phone_count'), __.as("a").out('has_vin').in('has_vin').count().as('vin_count'), __.as("a").out('has_dl').in('has_dl').count().as('dl_count'), __.as("a").values("id").as("person_id"), ).select("person_id","address_count", "phone_count", "dl_count", "vin_count") But this one is not working in alaytics – CristiC Nov 28 '16 at 08:00
  • What is the error you are receiving and does this work in OLTP mode? – jlacefie Nov 28 '16 at 15:44
  • @jlacefie this is the error that I have for this query g.V().hasLabel('person').match( __.as("a").out('has_address').in('has_address').count().as('address_count')) .select("person_id","address_count") **Local traversals may not traverse past the local star-graph on GraphComputer: [VertexStep(OUT,[has_address],vertex), VertexStep(IN,[has_address],vertex), CountGlobalStep]** – CristiC Nov 29 '16 at 09:02
  • @CristiC did you figure it out? i ran into the same issue and wonder if you found a solution yet. thanks! – zhibo May 04 '18 at 02:20
  • HI, you can't use this in the OLAT you need to re-think this query – CristiC Sep 04 '18 at 15:50