0

I had created Elastic search index using hive. Here, I have one temp table, where load all the raw data. From that table select some data on some criteria and insert them to a table which is integrated with Elastic search index.

After index creation I am comparing the count at hive table (in the main table on same criteria), on the table integrated with ES and elastic search index. found count does not same.

In ES index it is: 4663296 On table integrated with ES: 4663296 (same as ES) but in hive it's : 4611296 (main table on same criteria) - less then ES

So could some one please tell me why this count is more in ES. It should be same, am I right?

Thanks, Rackto

user1321939
  • 319
  • 2
  • 6
  • 18

1 Answers1

0

It was found that there was some duplicate records in the ES.

So, what I am doing, add the id manually (some key in the data which is always unique), now the count is same.

Just need to add one table properties: TBLPROPERTIES('......., 'es.mapping.id' = 'field_name_of_the_unique_id'); in hive table creation.

Thanks

user1321939
  • 319
  • 2
  • 6
  • 18