0

Background: I have several months experience using Gremlin and Faunus, incl. the ScriptMap step.

Problem: User defined Gremlin steps work fine when loaded in the shell as part of a script. However, the same steps apparently have no effect when defined in a Faunus ScriptMap script.

 /***********Faunus Driver*************/

//usage gremlin -e <hhis file> NOTE: to run in gremlin remove .submit() at end of pipe
import Java.io.Console;
//get args
console = System.console()
mapperpath=console. readLine ('> <map script path>: ')
refns=console.readLine('> <reference namespace>: ')
refinterestkey-console.readLine('> <interest field>: ')
//currently not in use
refinterestval=console.readLine('> <interest value>: ')         
mainpropkey=console.readLine('> ^main field>: ')
delim=console.readLine('> <main delimiter>: ')
args=[]
args [0]=refns
args [1]=refinterestkey
args[2]=refinterestval
args [3]=mainpropkey
args [4]=delim
args=(String[]) args.toArray()
f=FaunusFactory.open('propertyfile')
f.V().filter('{it.get Property("_namespace") =="streamernamespace" && it.getProperty("_entity")==" selector"}').script(mapperpath, args).submit()
f.shutdown()

/***********Script Mapper*************/

Gremlin.defineStep ("findMatch", [Vertex, Pipe], 
    {streamer,  interestindicator, fieldofinterest, fun ->
    _().has (interestindicator , true).has(fieldofinterest, 
                                fun(streamer)
    }
)
Gremlin.defineStep("connectMatch", [Vertex, Pipe], {streamer ->
// copy and link streaming vertices to matching vertices in main graph 
_().transform({if(main!= null) {
        mylog.info("reference vertex " + main.id    
                               +" & streaming vertex"+streamer.id+" match on main " +main.getProperty(fieldofinterest));
        clone=g.addVertex(null);
        ElementHelper.copyProperties(streamer, clone);
        clone.setProperty("_namespace", main.getProperty("__namespace"));
        mylog.info("create clone "+clone.id+" in "+clone.getProperty("_namespace"));
        g.addEdge(main, clone, streamer.getProperty("source");
        mylog.info("created edge "+ e);
        g.commit()
    }})
})

def g
def refns
def refinterestkey
def refinterestval
def mainpropkey
def delim
def normValue

def setup(args) {
    refns=args[0] 
    refinterestkey=args[1]
    refinterestval=args[2] 
    mainpropkey=args[3] 
    delim=args[4] 
    normValue = {obj-> seltype=obj.getProperty("type");
            seltypenorm=seltype.trim().toUpperCase();   
            desc=obj.getProperty("description"); 
            if(desc.contains(delim}) (
                selnum=desc.split(delim) [1].trim ()
            } else selnum=desc.trim();
            selnorm=seltypenorm.concat(delim).concat(selnum); 
            mylog.info ("streamer selector (" + seltype", "+desc+") normalized as "+selnorm);
            return selnorm
   }
    mylog=java.util.logging.Logger.getLogger("script_map")
    mylog.info ("configuring connection to reference graph
    conf=new BaseConfiguration()
    conf.setProperty("storage.backend", "cassandra"}
    conf.setProperty!"storage.keyspace", "titan"}
    conf.setProperty("storage.index.index-name", "titan")
    conf.setProperty("storage.hostname", "localhost")
    g=TitanFactory.open(conf)
    isstepsloaded = Gremlin.getStepnames().contains("findMatch"} && 
    Gremlin.getStepNames().contain("connectMatch"}
    mylog.info("custom steps available?: "+isstepsloaded)
}
def map{v, args) { 
    try{
    incoming=g.v(v.id)
    mylog.info{"current streamer id: "+incoming.id)
    if(incoming.getProperty("_entity")=="selector") {
                    mylog.info("process incoming vertex "+incoming.id)          
                    g.V{"_namespace", refns).findMatch(incoming,refinterestkey, mainpropkey,normValue).connectMatch(incoming).iterate ()
    } 
    }catch(Exception e) {
            mylog.info("map method exception raised");
            mylog.severe(e.getMessage()
    }
            g.commit()
}
def cleanup(args) { g.shutdown()}
fsj
  • 1
  • 4
  • what do you mean by the steps not having any effect? is there an error? – stephen mallette Jul 18 '14 at 21:19
  • no error until I inserted a guard in the map(): x.getProperty(prop) before the pipe. That resulted in an exception for invoking getProperty() on a null object. Verified the the faunus pipe as implemented in the map driver script is outputting vertices, but for reasons yet unknown the map() method in the ScriptMap script is not receiving the faunus pipe output – fsj Jul 24 '14 at 19:48
  • map now receiving vertices - problem seemed to be corrupted gremlin shell (clearing & CTRL+Z apparently didn't work - apparently had to kill putty window). See below for ongoing problems. – fsj Jul 28 '14 at 04:22
  • Per below, seems that only way to persist new elements generated via a faunus script map is to output them as blueprint elements. Is that correct? If so, is it feasible and practical to somehow convert the Titan elements I'm currently outputting, or do I need to overhaul the code to remove dependence on TitanGraph? – fsj Aug 05 '14 at 15:00

2 Answers2

1

I just tested Faunus with user defined steps over The Graph of the Gods and it seems to work just fine. Here's what I did:

father.groovy

Gremlin.defineStep('father', [Vertex, Pipe], {_().out('father')})

def g

def setup(args) {
    conf = new org.apache.commons.configuration.BaseConfiguration()
    conf.setProperty('storage.backend', 'cassandrathrift')
    conf.setProperty('storage.hostname', '192.168.2.110')
    g = com.thinkaurelius.titan.core.TitanFactory.open(conf)
}

def map(v, args) {
    u = g.v(v.id)
    pipe = u.father().name
    if (pipe.hasNext()) u.fathersName = pipe.next()
    u.name + "'s father's name is " + u.fathersName
}

def cleanup(args) {
    g.shutdown()
}

In Faunus' Gremlin REPL:

gremlin> g.V.has('type','demigod','god').script('father.groovy')
...
==>jupiter's father's name is saturn
==>hercules's father's name is jupiter
==>neptune's father's name is null
==>pluto's father's name is null

If this doesn't help to solve your problem, please provide more details, so we can reproduce the errors you see.

Cheers, Daniel

Daniel Kuppitz
  • 10,846
  • 1
  • 25
  • 34
  • By "Faunus' Gremlin REPL" I take it you mean "g" in the REPL is a FaunusGraph (I understand there is only one Gremlin REPL for processing both FaunusGraph and TitanGraph graphs). FYI, an earlier version of my script defined the steps with other defs prior to setup(), as you've done. The only difference was the defs preceded the step definitions - any chance that matters? Also, I'm not getting any error messages, just no effect as opposed to results using the same steps in a TitanGraph pipe outside a FaunusGraph script step. – fsj Jul 22 '14 at 00:21
  • By Faunus Gremlin REPL I mean the Gremlin console that's started via gremlin.sh. I don't think that other definitions can lead to non-functional user-defined steps. But as I've said, if it's still not working for you, please provide the full code that's not working, so we can reproduce it easily. – Daniel Kuppitz Jul 22 '14 at 09:31
  • i'll have to fat-finger the code, as I'm developing on an intranet. Meantime, please clarify whether you're accessing 'father.groovy' in the same dir from which you're running gremlin.sh. I thought scripts referenced in the script step had to be in hdfs (?) – fsj Jul 22 '14 at 13:44
  • Yes, I've uploaded it to HDFS, but didn't document this extra step, since you mentioned that you have several months experience with Faunus. – Daniel Kuppitz Jul 22 '14 at 18:26
  • Haven't gotten around to reproducing code yet, but made an interesting discovery: The faunus pipe containing the script step that references my script mapper was not sending any vertices to the mapper, apparently because the pipe had consecutive .has() steps. I confirmed that for TitanGraph g and FaunusGraph f, g.V().has().has().count() > 0 whereas f.V().has().has().count() = 0. Anyone know whether this is a bug or a design limitation? – fsj Jul 23 '14 at 21:45
  • Which version of Faunus do you use? 0.3.x had a bug with ```.has()```. Newer versions (0.4+) have no known bugs nor limitations when it comes to multiple filter steps. – Daniel Kuppitz Jul 24 '14 at 01:10
  • Faunus v 0.4.2 - switched to compound boolean in a single filter() – fsj Jul 24 '14 at 16:53
  • finally added full code - copied from scan so edit errors possible - runs fine but outputs pipe definition (as if ending with toString()) - all variables properly instantiated in pipe def - tried ending with _() but no change. – fsj Jul 28 '14 at 02:55
  • Dunno. ```.iterate()``` "returns" a ```void```, so I'm not sure what you're expecting from ```..._().iterate().connectMatch(incoming).iterate()```. This cannot work and should actually throw an exception. Anyway, if you get the ```.toString()``` output, then you probably want to do a ```.next()``` or ```.toList()``` to see the actual return values. – Daniel Kuppitz Jul 28 '14 at 03:30
  • Internal iterate is a typo - see correction - also tried running without final iterate(), but same results. The goal is to mutate the main graph by cloning and linking incoming vertices that match main graph vertices. The desired effect is achieved when the Gremlin pipe is executed outside a script map - are you suggesting .next() or toList() as a fix or for verification? – fsj Jul 28 '14 at 04:06
  • ```.next()``` or ```.toList()``` is only needed if you want to see the output, otherwise ```.iterate()``` should work. However, I'm not getting why you say "...runs fine but outputs pipe definition (as if ending with toString())...". This is what you should see without ```.iterate()``` (or any other iterating step). – Daniel Kuppitz Jul 28 '14 at 11:28
  • verified the findMatch step is correctly outputting matching vertices., but connectMatch outputting null (again, both steps work in regular Gremlin script). Investigating faunus graph output formats - was using NoOp format in the faunus properties file in assuming that since graph mutations via the script map were Gremlin, faunus formats were N/A. So far getting ClosedChannelExceptions with TitanCassandraOutputFormat. Am I barking up the wrong tree? – fsj Jul 29 '14 at 19:25
  • to clarify, I was getting the toString() effect despite ending the pipe w/ iterate() - per preceding I'm getting ClosedChannelExceptions associated with changing faunus output format from NoOp to Cassandra – fsj Jul 29 '14 at 19:35
  • looks like NoOp is the way to go: https://github.com/thinkaurelius/faunus/wiki/Distributed-Graph-Computing-with-Gremlin – fsj Jul 29 '14 at 20:22
  • Put commit()'s in place - no longer throwing persistence exception. Also corrected erroneous reference to "id" property (apparently can't use getProperty() method. logger.info output confirms I am creating vertice and edges correctly - they simply are not persisting. Further research indicates there are predefined script map methods for adding vertices and edges via the faunus script step - see bottom of [link]github.com/thinkaurelius/faunus/wiki/Titan-Format. Biggest diff from generic script map method is use of blueprint Graph vs. TitanGraph. Will proceed down this path ...– – fsj Aug 04 '14 at 18:03
  • Per preceding, seems that only way to persist new elements generated via a faunus script map is to output them as blueprint elements. Is that correct? If so, is it feasible and practical to somehow convert the Titan elements I'm currently outputting, or do I need to overhaul the code to remove dependence on TitanGraph? – fsj Aug 05 '14 at 14:59
  • In the script step you usually use a direct connection to your Titan graph. All the mutations you do there should be available in the graph when the Faunus job is done. The output / return value of the map function is just a side effect (usually text output). – Daniel Kuppitz Aug 05 '14 at 20:18
  • Works just as you describe when mutations are limited to changing pre-existing elements. However, when the mutations are insertions of new elements created in the script map, these don't show up in the graph although the log shows they were created, ids and all. Per the link I provided there seem to be specific script methods (getOrCreateVertex(), getOrCreateEdge()) for inserting new elements via faunus script map. Exploring that path barring has other insights ... – fsj Aug 05 '14 at 22:10
  • The getOrCreate methods are used for incremental loading when you use ```TitanCassandraOutputFormat```. They have nothing to do with the script step. Can you setup a small GitHub project with all the stuff you're doing? I'm losing track in this discussion. – Daniel Kuppitz Aug 05 '14 at 22:20
  • Unfortunately Github not feasible, as I couldn't maintain synch w/ off-line dev. Re your feedback, the methods in question are implemented in groovy scripts which are stored in hdfs, and take FaunusVertex elements as args. Is there some other way besides the Faunus script step to run hdfs scripts which process Faunus vertices? – fsj Aug 06 '14 at 13:50
  • [link](https://github.com/thinkaurelius/faunus/wiki/Script-Format): To answer my own question, yes, script I/O formats enable FaunusVertex read-in/write-out via hdfs script files apart from the script step. So per input from @DanielKuppitz I'll try using getOrCreate methods apart from the script step. – fsj Aug 06 '14 at 16:29
0

The root problem was I set an obsolete value for the "storage.index.index-name" property (see titan graph config under setup(). Disregard discussion re getOrCreate methods/blueprints: apparently a broad range of mutations on existing graphs can be achieved at scale using custom Gremlin steps defined inside a script referenced in the Faunus script step, with faunus format NoOpOutputFormat. Lesson learned: Instead of configuring titan graphs in-line in the script, distribute a (centrally maintained) graph properties file for reference in configuring the titan graph CDH5 has simplified distributed cache management

fsj
  • 1
  • 4