I have har files created as below for each date.
/ARC_DATA/FIFTEEN_MINS/ds=20220825.har
/ARC_DATA/FIFTEEN_MINS/ds=20220825.har/_SUCCESS
/ARC_DATA/FIFTEEN_MINS/ds=20220825.har/_index
/ARC_DATA/FIFTEEN_MINS/ds=20220825.har/_masterindex
/ARC_DATA/FIFTEEN_MINS/ds=20220825.har/part-0
/ARC_DATA/FIFTEEN_MINS/ds=20220825.har/part-1
The har files are created by archiving N number of avro files.
I would like to expose the har file data through Hive EXTERNAL table.
I have created EXTERNAL table using:
DROP TABLE pmdb.Arc15Mins;
CREATE EXTERNAL TABLE IF NOT EXISTS pmdb.Arc15Mins PARTITIONED BY (ds STRING) STORED AS AVRO LOCATION 'hdfs://pm-hadoop:8020/ARC_DATA/FIFTEEN_MINS/' TBLPROPERTIES('avro.schema.url'='hdfs://pm-hadoop:8020/SCHEMAS/15minsavro.avsc', 'discover.partitions' = 'true' , 'hive.enforce.bucketing' = 'true');
I have executed below commands, to load the data into partitions
ALTER TABLE pmdb.Arc15Mins ADD PARTITION (ds="20220825.har") LOCATION 'har://hdfs-pm-hadoop:8020/ARC_DATA/FIFTEEN_MINS/ds=20220825.har';
LOAD DATA INPATH 'har://hdfs-pm-hadoop:8020/ARC_DATA/FIFTEEN_MINS/ds=20220825.har' INTO TABLE pmdb.Arc15Mins PARTITION(ds="20220825.har");
MSCK REPAIR TABLE pmdb.Arc15Mins;
Still I am not able to get data from EXTERNAL table. Below select query returns 0 rows.
SELECT * from pmdb.Arc15Mins;
My questions are:
- Is it possible to expose har file data through HIVE EXTERNAL table?
- If possible, what should be the LOCATION value in CREATE TABLE command?
- ALTER TABLE and/or LOAD DATA INPATH supports har file format ?