1

My problem is that I want to store Product, customer and seller data in titan graph database which has cassandra as storage backend and elasticsearch as indexing backend. Then I ll be querying that data to make recommendations to both customer and seller. I am not able to get to the point where I can store my own data .Since data is going to be huge I ll be using cassandra and elasticsearch .

What I have done so far is that I have cassandra , elasticsearch set up. Now I can run bin/titan.sh start to start cassandra,es and gremlin server I can also play with graph of the gods data by

gremlin> graph = TitanFactory.open('conf/titan-cassandra-es.properties')
==>standardtitangraph[cassandrathrift:[127.0.0.1]]
gremlin> GraphOfTheGodsFactory.load(graph)
==>null

Now I am trying to find a way to store my product,customer and seller graph data. such that its stored on cassandra and indices are on elasticsearch.

What steps should I take to do that. My main language for the project is nodejs and java is out of question due to project constraints.

My questions in short

  1. how to store my own data for titan db to process
  2. Once data is available for processing. I ll be exposing some http apis for making recommendations . writing in java is out of question due to some constraints. How should I go ahead with it.(I think I have only gremlin as the alternative)

I ll be grateful if you can point out my mistakes and drop some bread crumbs in the correct direction

2 Answers2

4

If you can't use Java then you are limited to using Groovy. As for

how to store my own data for titan db to process

Side Note

With a graph DB there are a multitudes of ways of storing this data. If you want to really formalise the structure of your data I would recommend looking into Ontologies, OWL, and Topic Maps these can serve as great inspiration for how to formalise and structure the data in a graph DB. These reads are only good if you looking for ways of very formally structuring data in graphs.

Structure Example

For now let's assume you just want to to track customers and the products they have bought. One simple structure is that both customers and products are vertices with an edge from a customer to a product serving as the fact that a customer has bought that product. We can even put additional data on that edge such as time of purchase and quantity. Here is an example of how to do that in Groovy:

g = TitanFactory.open("titan-cassandra-es.properties")
gremlin> customerBob = g.addVertex("Bob"); 
==>v[12]
gremlin> customerAlice = g.addVertex("Alice");
==>v[13]
gremlin> productFish = g.addVertex("Fish");
==>v[14]
gremlin> productMeat = g.addVertex("Meat");
==>v[15]
gremlin> edge = customerBob.addEdge("purchased", productMeat, "Day", "Friday", "Qauntity", 2);
==>e[16][12-purchased->15]
gremlin> edge = customerBob.addEdge("purchased", productFish, "Day", "Friday", "Qauntity", 1);
==>e[17][12-purchased->14]
gremlin> edge = customerAlice.addEdge("purchased", productMeat, "Day", "Monday", "Qauntity", 3);
==>e[18][13-purchased->15]

The above basically says that Bob bought some Meat and Fish on Friday while Alice bought some Meat on Monday. If we wanted to find out what Bob bought on Friday, we could make the following traversal

gremlin> g.traversal().V().hasLabel("Bob").outE("purchased").has("Day", "Friday").otherV().label();
==>Meat
==>Fish

Indexing

Before really diving into indexing play around with understanding the structure. The following is a VERY skeletal explanation on indexing with Elasticsearch and Titan:

With regards to indexing, know that titan has different types of indices, Composite, Vertex-Centric, and Mixed all serve their purpose and you should read this for more info.

Indexing is used to speed up traversals and lookups. So you need to decide what to index. For our example we want to quickly know all purchases made on different days. This means that we can put a mixed index on edges to help us (composite indices serve just as well but you are asking about elasticsearch so we going to use a mixed index).

To define a mixed index we start by defining a simple schema (more info here):

mgmt = graph.openManagement();
purchased = mgmt.makeEdgeLabel("purchased").multiplicity(MULTI).make();
day = mgmt.makePropertyKey("Day").dataType(String.class).make();

You don't need to explicitly define the schema for everything but it is essential for anything you want to index. Now you can create your index:

mgmt.buildIndex("productsPurchased", Edge.class).addKey(day).buildMixedIndex("search")
mgmt.commit() //"search" is defined in your titan-conf.properties file

With this index queries such as:

g.traversal().E().has("Day", "Friday")

will be much faster.

Note: You should make your indices and schema before loading data. It just makes things simpler in the long run.

Filipe Teixeira
  • 3,565
  • 1
  • 25
  • 45
  • Thanks for such a detailed answer.I have few doubts 1) When I do g.traversal().V().values() it gives the vertices of graph of gods not product buyer seller data nodes (I had previously loaded graphOfGods data . ) . 2. doubt is where is this product buyer seller data stored (on cassandra?) 3) when I run the search query g.traversal().V().hasLabel("Bob").outE("purchased").has("Day", "Friday") it says [Query requires iterating over all vertices [(~label = Bob)]. For better performance, use indexes] even when I have followed your mgmt steps. Any hints Where I may be getting it wrong. – palash kulshreshtha Apr 13 '16 at 16:50
  • 1
    1. If you loaded graph of the gods and saved it then yep you going to see that data mixed into your own data. You can use TitanCleanup to clear a graph quickly if need be. 2. Yep it's stored in cassandra. At the end of the day Titan is a graph library that supports multiple backends, cassandra being one of them. 3. That's natural because we didn't index that. As i said we have to choose what we index. In the example I indexed the edge. In such a simple case with Bob a composite index is best for quick lookups. Check the docs on titan composite indices. They are very comprehensive. – Filipe Teixeira Apr 13 '16 at 18:10
  • Again thanks a ton for the explaination !! Made my day. – palash kulshreshtha Apr 15 '16 at 04:47
4

Because your main language is JavaScript/Node.js, you can use https://www.npmjs.com/package/gremlin which is a WebSocket client for TinkerPop3 Gremlin Server (disclaimer: library author here). You use the client to send strings of Gremlin-Groovy queries to a remote Gremlin Server.

The most basic way of interacting with the graph is:

import { createClient } from 'gremlin';

const client = createClient(8182, 'localhost');

client.execute('g.V()', (err, results) => {
    // handle err or results
}

There are more advanced modes detailed in the documentation. The client also supports bound parameters for better security and performance.

It may be too early to comment on your domain and data modeling so I'll just stick with the environment part of your question in order to get you started.

jbmusso
  • 3,436
  • 1
  • 25
  • 36