I created an index and indexed data into it using elasticsearch-hadoop-2.2. The HQL looks like this:
CREATE EXTERNAL TABLE es_external_table (
field1 type1,
field2 type2
) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES (
'es.batch.size.bytes'='1mb',
'es.batch.size.entries'='0',
'es.batch.write.refresh'='false',
'es.batch.write.retry.count'='3',
'es.mapping.id'='field1',
'es.write.operation'='index',
'es.nodes'='IP:9200',
'es.nodes.discovery'='false',
'es.resource'='my_index/my_type');
insert into table es_external_table select field1, field2... from hive_table1
Table es_external_table is the external table and hive_table1 is the source table. I find that there are 1332561 items in hive_table1. However there is only 1332559 docs in elasticsearch. Both _count
API and _search?search_type=count
API return 1332559. Two docs were missing.
I usee curl -XGET 'http://IP:9200/my_index/my_type/my_id?_source=false
to check data in Elasticsearch. And my_id
is from hive_table1 and it is also the _id in my_index/my_type. All 1332561 return "found":true
.
I am confused.
- How many docs in my_index/my_type actually?
- Why
_count
andsearch_type=count
API return 1332559 if there are 1332561? - Which docs are missing if there are 1332559 docs?
Any suggestions?