0

I am trying to use HIVE 0.13 to access cassandra 2.0.8 column families created with CQL3.

Here is how I created my column families:

CREATE KEYSPACE IF NOT EXISTS Identification
  WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
  'DC1' : 2 };

USE Identification;

CREATE TABLE IF NOT EXISTS entitylookup (
  name varchar,
  value varchar,
  entity_id uuid,
  PRIMARY KEY ((name, value), entity_id))
WITH
    caching=all
;

I followed the instructions from the README of this project: https://github.com/tuplejump/cash/tree/master/cassandra-handler

I generated hive-cassandra-1.2.6.jar, copied it and cassandra-all-1.2.6.jar, cassandra-thrift-1.2.6.jar to hive lib folder.

Then I started hive and tried the following:

CREATE EXTERNAL TABLE identification.entitylookup(name string, value string, entity_id binary)
STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES("cql.primarykey" = "name, value", "cassandra.host" = "localhost", "cassandra.port "= "9160")
TBLPROPERTIES ("cassandra.ks.name" = "identification", "cassandra.ks.stratOptions"="'DC1':2", "cassandra.ks.strategy"="NetworkTopologyStrategy");

Here is the output:

hive> mvalle@mvalle:~/hadoop$ hive
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/mvalle/hadoop/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
OpenJDK 64-Bit Server VM warning: You have loaded library /home/mvalle/hadoop/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
hive> CREATE EXTERNAL TABLE identification.entitylookup(name string, value string, entity_id binary)
    > STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES("cql.primarykey" = "name, value", "cassandra.host" = "ident.s1mbi0se.com", "cassandra.port "= "9160")
    > TBLPROPERTIES ("cassandra.ks.name" = "identification", "cassandra.ks.stratOptions"="'DC1':2", "cassandra.ks.strategy"="NetworkTopologyStrategy");
FAILED: SemanticException [Error 10072]: Database does not exist: identification

Question: how do I do to get more information about what is going wrong? I tried the same hive commando using "Identification" (capital I), but same result. Is it possible to access CQL3 column families in cassandra community? It seems the keyspace has not been mapped, but I don't see how to map then. In DSE, they are automatically mapped...

EDIT:

To clarify more, if I create an empty database and then try to create the external table, here is what I get:

hive> create database identification;
OK
Time taken: 0.154 seconds
hive> CREATE EXTERNAL TABLE identification.entity_lookup(name string, value string, entity_id binary)
    > STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES("cql.primarykey" = "name, value", "cassandra.host" = "localhost", "cassandra.port "= "9160")
    > TBLPROPERTIES ("cassandra.ks.name" = "identification", "cassandra.ks.stratOptions"="'DC1':3", "cassandra.ks.strategy"="NetworkTopologyStrategy");
OK
Time taken: 3.58 seconds
hive> select * from identification.entity_lookup limit 10;
OK
Exception in thread "main" java.lang.InstantiationError: org.apache.hadoop.mapreduce.JobContext
    at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:166)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:418)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:534)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1488)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
mvallebr
  • 2,388
  • 21
  • 36

1 Answers1

0

The error is not because Cash couldn't map the keyspace, but because the database in not present in hive.

Just create the database in hive using,

CREATE DATABASE identification;

That should get it working.

Rohit
  • 53
  • 1
  • 7
  • I tried that, but it shows an empty hive database and not my cassandra data. – mvallebr Jul 15 '14 at 17:18
  • Yes, it will be an empty Hive Database... then you will need to create the external tables in it. Currently we don't support automatic table creation in Cash. – Rohit Jul 17 '14 at 07:11
  • If I do that, I get an error anyway. I updated the question to show it. – mvallebr Jul 17 '14 at 12:55