6

I have the following file on HDFS: enter image description here

I create the structure of the external table in Hive:

CREATE EXTERNAL TABLE google_analytics(
  `session` INT)
PARTITIONED BY (date_string string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/flumania/google_analytics';

ALTER TABLE google_analytics ADD PARTITION (date_string = '2016-09-06') LOCATION '/flumania/google_analytics';

After that, the table structure is created in Hive but I cannot see any data: enter image description here

Since it's an external table, data insertion should be done automatically, right?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
rom
  • 3,592
  • 7
  • 41
  • 71

2 Answers2

1

your file should be in this sequence.

int,string

here you file contents are in below sequence

string, int

change your file to below.

86,"2016-08-20"
78,"2016-08-21"

It should work.
Also it is not recommended to use keywords as column names (date);

dileepvarma
  • 508
  • 2
  • 7
  • 30
  • I have reversed the order of the columns and renamed the date field. Still no data available. Any other idea? – rom Sep 06 '16 at 08:12
1

I think the problem was with the alter table command. The code below solved my problem:

CREATE EXTERNAL TABLE google_analytics(
  `session` INT)
PARTITIONED BY (date_string string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/flumania/google_analytics/';

ALTER TABLE google_analytics ADD PARTITION (date_string = '2016-09-06');

After these two steps, if you have a date_string=2016-09-06 subfolder with a csv file corresponding to the structure of the table, data will be automatically loaded and you can already use select queries to see the data.

Solved!

rom
  • 3,592
  • 7
  • 41
  • 71