0

I have har files created as below for each date.

/ARC_DATA/FIFTEEN_MINS/ds=20220825.har
/ARC_DATA/FIFTEEN_MINS/ds=20220825.har/_SUCCESS
/ARC_DATA/FIFTEEN_MINS/ds=20220825.har/_index
/ARC_DATA/FIFTEEN_MINS/ds=20220825.har/_masterindex
/ARC_DATA/FIFTEEN_MINS/ds=20220825.har/part-0
/ARC_DATA/FIFTEEN_MINS/ds=20220825.har/part-1

The har files are created by archiving N number of avro files.

I would like to expose the har file data through Hive EXTERNAL table.

I have created EXTERNAL table using:

DROP TABLE pmdb.Arc15Mins;
CREATE EXTERNAL TABLE IF NOT EXISTS pmdb.Arc15Mins PARTITIONED BY (ds STRING) STORED AS AVRO  LOCATION 'hdfs://pm-hadoop:8020/ARC_DATA/FIFTEEN_MINS/' TBLPROPERTIES('avro.schema.url'='hdfs://pm-hadoop:8020/SCHEMAS/15minsavro.avsc',  'discover.partitions' = 'true' , 'hive.enforce.bucketing' = 'true');

I have executed below commands, to load the data into partitions

ALTER TABLE pmdb.Arc15Mins ADD PARTITION (ds="20220825.har") LOCATION 'har://hdfs-pm-hadoop:8020/ARC_DATA/FIFTEEN_MINS/ds=20220825.har';

LOAD DATA INPATH 'har://hdfs-pm-hadoop:8020/ARC_DATA/FIFTEEN_MINS/ds=20220825.har' INTO TABLE pmdb.Arc15Mins PARTITION(ds="20220825.har");

MSCK REPAIR TABLE pmdb.Arc15Mins;

Still I am not able to get data from EXTERNAL table. Below select query returns 0 rows.

SELECT * from pmdb.Arc15Mins;

My questions are:

  1. Is it possible to expose har file data through HIVE EXTERNAL table?
  2. If possible, what should be the LOCATION value in CREATE TABLE command?
  3. ALTER TABLE and/or LOAD DATA INPATH supports har file format ?
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245

0 Answers0