Indexing data from HDFS to Elasticsearch via Hive

Question

I'm using Elasticsearch for Hadoop plugin in order to read and index documents in Elasticsearch via Hive.

I followed the documentation in this page: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html

In order to index documents in Elasticsearch with Hadoop, you need to create a table in Hive that is configured properly. And I encountered a problem with inserting data into that hive table.

That’s the table's script for writing I used to create:

CREATE EXTERNAL TABLE es_names_w
(
 firstname string,
 lastname string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'hive_test/names', 'es.index.auto.create' = 'true')

Then I tried to insert data:

INSERT OVERWRITE TABLE es_names_w
SELECT firstname,lastname
FROM tmp_names_source;

The error I get from hive is: "Job submission failed with exception 'org.apache.hadoom.ipc.RemoteExaption(java.lang.RuntimeExeption: org.xml.sax.SAXParseException; systemId: file:////hdfs_data/mapred/jt/jobTracker/job_201506091622_0064.xml; lineNunber: 607; columnNumber:51; Character reference "&#..."

However, this error occurs ONLY when the hive table that I create has more than one column.

For example, this code works:

CREATE EXTERNAL TABLE es_names_w
(
 firstname string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'hive_test/names', 'es.index.auto.create' = 'true')

INSERT OVERWRITE TABLE es_names_w
    SELECT firstname
    FROM tmp_names_source;

Everything went well, Hive has created a new type in elasticsearch index and the data has been indexed in Elasticsearch

I really don’t know why my first attempt doesn't work

I would appreciate some help, Thanks

score 0 · Answer 1 · answered Mar 30 '16 at 13:16

0

Can you add this to the tbl es.mapping.id’=’key’. The key can be your firstname.

answered Mar 30 '16 at 13:16

Debraj

11
3

score 0 · Answer 2 · edited Jul 15 '17 at 01:08

0

Try

es.index.auto.create' = 'false'

edited Jul 15 '17 at 01:08

Baum mit Augen

49,044
25
144
182

answered Jul 13 '17 at 06:12

Ashwini Kumar

85
10

score 0 · Answer 3 · answered Jul 16 '17 at 17:38

Try with SerDe it will work out. For eg.

 CREATE EXTERNAL TABLE elasticsearch_es (  
 name STRING, id INT, country STRING )  
 ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'  
 STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'  
 TBLPROPERTIES ('es.resource'='elasticsearch/demo');

Also, make sure when you create index and type in ES you create the exact same mapping as Hive column's in ES.

Indexing data from HDFS to Elasticsearch via Hive

3 Answers3