1

Totally stuck with fetch data from hive external table. I have done below till now.

  1. I had a Managed table with date field whose value is 2014-10-23.
  2. I created external table to store data in elastic search like below

    create external table ext3 ( run_date date) ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe' STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'dfs/ext3', 'es.field.read.empty.as.null' = 'true','es.nodes'=);

  3. inserting one row in external table to create the Elastic Search Index and mapping.

Problem 1: My Elastic search field is created as string.

  1. Later I changed the mapping in elastic search to date.

    "run_date":{"type":"date", "format": "yyyy-MM-ddZ", "index": "not_analyzed"}

  2. re inserted the data in external table. when I query Elastic search its very fine. value is displayed as '2014-10-23+08:00'

Problem 2 When I query data for external table like select count(*) from ext3 I am getting below error.

2015-04-17 18:45:34,254 FATAL [main] org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DateWritable
    at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDateObjectInspector.getPrimitiveWritableObject(WritableDateObjectInspector.java:38)
    at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:259)
    at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:349)
    at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:193)
    at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:179)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:545)

Guys please help me on this, whole day is wasted. I have another external table with more data, I need to join these two tables and create a view to have my consolidated data ready for analysis.

Raghu
  • 23
  • 1
  • 3

1 Answers1

3

I think the error gives a clue to your problem:

Error getting row data with exception java.lang.ClassCastException:
  org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to 
  org.apache.hadoop.hive.serde2.io.DateWritable

You have a date field in your hive table but the data you have inserted is of the type timestamp.

Re-create your table (or a new one if you don't want to replace it)

CREATE EXTERNAL TABLE ext3 ( run_date timestamp )
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe' 
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'dfs/ext3', 'es.field.read.empty.as.null' = 'true','es.nodes'=);
while
  • 3,602
  • 4
  • 33
  • 42
  • Great!! it worked. But now I have loaded ~ 19million records to this table. When i query Elastic search, its great, But if I query "select count(*) from ext3" in my hive its taking hell lot of time. Almost 10 mins and still progress is 0%. Any clue? – Raghu Apr 20 '15 at 10:10
  • No sorry. Im not sure it is related to this error. I would suggest asking a new question providing all your details and example data. – while Apr 21 '15 at 10:37
  • 1
    @Raghu you don't loaded any data, you are using EXTERNAL table, Hive "wraps" elasticsearch HTTP API as a data warehouse, it's not a data-base. You could create a normal Hive table, and import elasticsearch into it, next queries should be faster. – Thomas Decaux Oct 24 '16 at 14:45