0

I created an index and indexed data into it using elasticsearch-hadoop-2.2. The HQL looks like this:

CREATE EXTERNAL TABLE es_external_table (
field1 type1,
field2 type2
  ) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES (
'es.batch.size.bytes'='1mb',
'es.batch.size.entries'='0',
'es.batch.write.refresh'='false',
'es.batch.write.retry.count'='3',
'es.mapping.id'='field1',
'es.write.operation'='index',
'es.nodes'='IP:9200',
'es.nodes.discovery'='false',
'es.resource'='my_index/my_type');

insert into table es_external_table select field1, field2... from hive_table1

Table es_external_table is the external table and hive_table1 is the source table. I find that there are 1332561 items in hive_table1. However there is only 1332559 docs in elasticsearch. Both _count API and _search?search_type=count API return 1332559. Two docs were missing.

I usee curl -XGET 'http://IP:9200/my_index/my_type/my_id?_source=false to check data in Elasticsearch. And my_id is from hive_table1 and it is also the _id in my_index/my_type. All 1332561 return "found":true.

I am confused.

  1. How many docs in my_index/my_type actually?
  2. Why _count and search_type=countAPI return 1332559 if there are 1332561?
  3. Which docs are missing if there are 1332559 docs?

Any suggestions?

Longxing Wei
  • 171
  • 2
  • 17

0 Answers0