I have build a property graph(60 million nodes, 40 million edges) from s3 using Apache Spark Graphx framework. I want to fire traversal queries on this graph.
My queries will be like:-
g.V().has("name","xyz").out('parent').out().has('name','abc')
g.V().has('proc_name','serv.exe').out('file_create').
has('file_path',containing('Tsk04.txt')).in().in('parent').values('proc_name')
g.V().has('md5','935ca12348040410e0b2a8215180474e').values('files')
mostly queries are of form g.V().out().out().out()
Such queries are easily possible on graph db's like neo4j,titan,aws neptune since they support gremlin.
Can we traverse spark graphs in such manner. I tried spark pregel-api but its bit complex as compared to gremlin.
Reason I am looking for spark graph is because cloud solutions of above mentioned graphdbs is costly.