0

I'm using Elasticsearch for Hadoop plugin in order to read and index documents in Elasticsearch via Hive.

I followed the documentation in this page: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html

In order to index documents in Elasticsearch with Hadoop, you need to create a table in Hive that is configured properly. And I encountered a problem with inserting data into that hive table.

That’s the table's script for writing I used to create:

CREATE EXTERNAL TABLE es_names_w
(
 firstname string,
 lastname string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'hive_test/names', 'es.index.auto.create' = 'true')

Then I tried to insert data:

INSERT OVERWRITE TABLE es_names_w
SELECT firstname,lastname
FROM tmp_names_source;

The error I get from hive is: "Job submission failed with exception 'org.apache.hadoom.ipc.RemoteExaption(java.lang.RuntimeExeption: org.xml.sax.SAXParseException; systemId: file:////hdfs_data/mapred/jt/jobTracker/job_201506091622_0064.xml; lineNunber: 607; columnNumber:51; Character reference "&#..."

However, this error occurs ONLY when the hive table that I create has more than one column.

For example, this code works:

CREATE EXTERNAL TABLE es_names_w
(
 firstname string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'hive_test/names', 'es.index.auto.create' = 'true')

INSERT OVERWRITE TABLE es_names_w
    SELECT firstname
    FROM tmp_names_source;

Everything went well, Hive has created a new type in elasticsearch index and the data has been indexed in Elasticsearch

I really don’t know why my first attempt doesn't work

I would appreciate some help, Thanks

orlevii
  • 427
  • 4
  • 10

3 Answers3

0

Can you add this to the tbl es.mapping.id’=’key’. The key can be your firstname.

Debraj
  • 11
  • 3
0

Try

es.index.auto.create' = 'false'
Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182
0

Try with SerDe it will work out. For eg.

 CREATE EXTERNAL TABLE elasticsearch_es (  
 name STRING, id INT, country STRING )  
 ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'  
 STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'  
 TBLPROPERTIES ('es.resource'='elasticsearch/demo');

Also, make sure when you create index and type in ES you create the exact same mapping as Hive column's in ES.