0

I am using Hive v1.2.1 with Tez. I have an external partitioned table. The partitions are hourly and of the form p=yyyy_mm_dd_hh. The situation is that these partition directories in hdfs are likely to be deleted sometime. After they are deleted, hive still contains the metadata for that partition, and a command 'show partitions ' would still list the partition whose directory was deleted from hdfs. Normally, this is not likely to cause any problem, and a select query for the partition(whose directory was deleted) would simply result an empty resultset:

hive> select * from test_tab where p='2015_01_01_01';
OK
Time taken: 2.168 seconds

However, on running any aggregate query against the same partition, I get an error:

hive> select count(*) from test_tab where p='2015_01_01_01';
FAILED: SemanticException java.io.FileNotFoundException: File hdfs://localhost:8020/user/root/data/test_db/test_tab/p=2015_01_01_01 does not exist.

I need to have the same behavior in aggregate queries as that in other select queries. This is probably a bug in hive. Any workaround-hints for this issue would be appreciated. Best Regards.

Ankit Khettry
  • 997
  • 1
  • 13
  • 33
  • I have encountred this behaviour when working with HCat and I think its not a bug, because in normal behaviour, user have to drop partitions before deleting their folders. – 54l3d May 26 '16 at 10:49
  • _Hive doesn’t care if a partition directory doesn’t exist for a partition or if it has no files. In both cases, you’ll just get no results for a query that filters for the partition_ - Quoted from Programming Hive: [link](https://www.safaribooksonline.com/library/view/programming-hive/9781449326944/ch04.html) . If Hive is not supposed to care if the partition directory exists, why the error in the agg query? This makes me think it's a bug – Ankit Khettry May 26 '16 at 11:37

1 Answers1

0

run below command

msck repair table test_tab;

and then run your query

Rishav Rohit
  • 114
  • 5
  • 1
    This won't remove the partitions pointing to the manually deleted files on HDFS. It just adds manually created partitions folder to Hive metastore. – cheseaux Jul 11 '17 at 12:43