0

My input data is as follows:

1,srinivas,courtthomas,memphis
2,vindhya,courtthomas,memphis
3,srinivas,courtthomas,kolkata
4,vindhya,courtthomas,memphis

And I have created the following queries:

create EXTERNAL table seesaw (id int,name string,location string) partitioned by (address string) row format delimited fields terminated by ',' lines terminated by '\n' stored as textfile LOCATION '/seesaw';

LOAD DATA INPATH '/sampledoc' OVERWRITE INTO TABLE seesaw PARTITION (address = 'Memphis');

when I try to fetch my query it comes as follows:

Select * from seesaw;                                                                     
OK
1   srinivas    courtthomas Memphis
2   vindhya courtthomas Memphis
3   srinivas    courtthomas Memphis
4   vindhya courtthomas Memphis

I really don't understand how all the rows have been showing memphis at the end.

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94
sri
  • 21
  • 1

2 Answers2

0

Read your code closely:

create EXTERNAL table seesaw (id int,name string,location string)

Notice that there are only three columns, id, name and location.

Your data, however,

1,srinivas,courtthomas,memphis
2,vindhya,courtthomas,memphis
3,srinivas,courtthomas,kolkata
4,vindhya,courtthomas,memphis

has four columns. Something's fishy here.

LOAD DATA INPATH '/sampledoc'
  OVERWRITE INTO TABLE seesaw
  PARTITION (address = 'Memphis');

you're asking to partition a category that only contains courtthomas by Memphis. The result is to little surprise not what you want.

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94
  • Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines can you please tell me what does that mean ? I found it in the hive manual. especially what does ^A means ? – sri Jun 26 '16 at 22:54
0

If you are using external table, you will need to manually create folders for each partition, i.e in your case - create two folders [address=Memphis] and [address=kolkata] AND copy the corresponding input data files under the corresponding folder and then add the partitions to metadata as follows:

ALTER TABLE seesaw ADD PARTITION(address='Memphis');
ALTER TABLE seesaw ADD PARTITION(address='kolkata');

Refer this article for a simple example of how to do this - hive-external-table-with-partitions

janeshs
  • 793
  • 2
  • 12
  • 26