Issue while ingesting a Titan graph into Faunus

Question

I have installed both Titan and Faunus and each seems to be working properly (titan-0.4.4 & faunus-0.4.4)

However, after ingesting a sizable graph in Titan and trying to import it in Faunus via

FaunusFactory.open(    )

I am experiencing issues. To be more precise, I do seem to get a faunus graph from the call FaunusFactory.open( ),

faunusgraph[titanhbaseinputformat->titanhbaseoutputformat]

but then, even asking a simple

g.v(10)

I do get this error:

Task Id : attempt_201407181049_0009_m_000000_0, Status : FAILED
com.thinkaurelius.titan.core.TitanException: Exception in Titan
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.getAdminInterface(HBaseStoreManager.java:380)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureColumnFamilyExists(HBaseStoreManager.java:275)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.openDatabase(HBaseStoreManager.java:228)

My property file is taken straight out of the Faunus page with Titan-HBase input, except of course changing the url of the hadoop cluster:

faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname= my IP
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseOutputFormat
faunus.graph.output.titan.storage.backend=hbase
faunus.graph.output.titan.storage.hostname= IP of my host
faunus.graph.output.titan.storage.port=2181
faunus.graph.output.titan.storage.tablename=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.output.location=output1
zookeeper.znode.parent=/hbase-unsecure
titan.graph.output.ids.block-size=100000

Anyone can help?

ADDENDUM:

To address the comment below, here is some context: as I have mentioned, I have a graph in Titan and can perform basic gremlin queries on it.

However, I do need to run a gremlin global query which, due to the size of the graph, needs Faunus and its underlying MR capabilities. Hence the need to import it. The error I get doesn't look to me as if it points to some inconsistency in the graph itself.

Getting a FaunusGraph instance basically means your config file was consumed properly - doesn't mean it will actually execute. Could you please amend the actual Gremlin you are trying to execute? Also, what are you trying to accomplish with: `faunusgraph[titanhbaseinputformat->titanhbaseoutputformat]` — stephen mallette, Jul 22 '14 at 17:27
ok, I amended the query: g.v(1000). I just entered it so see if I can get back a vertex. The actual query I intend to run is not that trivial (it is a global query on paths in the graph), but first I need to ingest the graph properly. I put some context as to why I am interested in the post. Any clues why I get this error? — Mirco A. Mannucci, Jul 23 '14 at 11:06

score 1 · Answer 1 · answered Jul 23 '14 at 11:53

I'm not sure that you have your "flow" of Faunus right. If your end result is to do a global query of the graph, then consider this approach:

pull your graph to sequence file
issue your global query over the sequence file

More specifically create hbase-seq.properties:

# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname=localhost
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
# hbase.mapreduce.scan.cachedrows=1000

# output data (graph or statistic) parameters
faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=snapshot
faunus.output.location.overwrite=true

In Faunus, copy do:

g = FaunusFactory.open('hbase-seq.properties')
g._()

That will read the graph from hbase and write it to sequence file in HDFS. Next, create: seq-noop.properties with these contents:

# input graph parameters
faunus.graph.input.format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
faunus.input.location=snapshot/job-0

# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=analysis
faunus.output.location.overwrite=true

The above configuration will read your sequence file from the previous step and without re-writing the graph (that's what NoOpOutputFormat is for). Now in Faunus do:

g = FaunusFactory.open('seq-noop.properties')
g.V.sideEffect('{it.degree=it.bothE.count()}').degree.groupCount()

This will execute a degree distribution, writing the results in HDFS to the 'analysis' directory. Obviously you can do whatever Faunus-flavored Gremlin you want here - I just wanted to provide an example. I think this is a pretty standard "flow" or pattern for using Faunus from a graph analysis perspective.

Your answer was useful to me to understand better how to use Faunus. It is well presented and clear. Unfortunately, it does not address my current issue: I Have used your hbase-seq.properties (changing only the IP) but it returns the same error. I am quite sure that this exception has something to do with the overall setup, though I have no clue where: hadoop works properly, same for hbase, titan and faunus (when they are used separately). I will dig a bit deeper into com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager..... — Mirco A. Mannucci, Jul 24 '14 at 09:49
Ok - I didn't think it would solve your problem necessarily, but I wanted to get the "flow" straight first. Reading to sequence file is about the most simple (and most common) thing you can do with Faunus. Now that you're doing that and still getting the same error I agree that it must be something in the setup. is there anything more to the exception you supplied? Perhaps, there is a "cause" exception deeper in the stack trace? — stephen mallette, Jul 24 '14 at 10:40
The complete stack trace has been added by my colleague Varun here: https://github.com/thinkaurelius/faunus/issues/174 (so far no answer). This exception has already appeared in a few places, as Google shows, albeit in different contexts. I still have no clue as to its cause, though it is transparent that it has to do with Faunus trying to access HBase via Titan. The fact that Titan by itself works just fine makes me think that its own comunication with HBase is allright. The mystery continues... — Mirco A. Mannucci, Jul 28 '14 at 11:36
@MircoMannucci I'm going to reopen that issue and continue this thread on the comments in issue #714, since this doesn't seem too well suited to SO's QA format. — Dan LaRocque, Sep 08 '14 at 16:31
@DanLaRocque thanks! I have just asked Varun to look into it. Meanwhile, any news on Titan/Faunus integration for Hadoop 2? Our problem begun because we had to "downgrade" our environment to Hadoop 1 to use Faunus... — Mirco A. Mannucci, Sep 14 '14 at 19:51
Titan 0.5 supports Hadoop 2. Faunus has been merged into the Titan packaging and is now simply known as "titan-hadoop". Just visit the download page and grab the "hadoop2" packaging. See the latest documentation on Titan 0.5 here: http://s3.thinkaurelius.com/docs/titan/0.5.0/ — stephen mallette, Sep 14 '14 at 21:19

Issue while ingesting a Titan graph into Faunus

1 Answers1