0

I have created a hive table with the query -

create table studpart4(id int, name string) partitioned by (course string, year int) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;

Created successfully.

Loaded data with below command -

load data local inpath '/scratch/hive_inputs/student_input_1.txt' overwrite into table studpart4 partition(course='cse',year=2);

And my input data file looks like -

 101    student1    cse 1

 102    student2    cse 2

 103    student3    eee 3

 104    student4    eee 4

 105    student5    cse 1

 106    student6    cse 2

 107    student7    eee 3

 108    student8    eee 4

 109    student9    cse 1

 110    student10   cse 2

But output is displayed as (of select * from studpart4) --

 101    student1    cse 2

 102    student2    cse 2

 103    student3    eee 2

 104    student4    eee 2

 105    student5    cse 2

 106    student6    cse 2

 107    student7    eee 2

 108    student8    eee 2

 109    student9    cse 2

 110    student10   cse 2

Why does the last column is all 2. Why did it get changed and updating wrongly.

Nishu Tayal
  • 20,106
  • 8
  • 49
  • 101
Suresh J
  • 31
  • 3

1 Answers1

0

The result you are displayed is exactly what you told Hive to do with your data.

In your first command, you are creating a partitioned table studpart4 with two columns, id and name, and two partitioned keys, course and year (which once created, behave like regular columns). Now, in your second command, what you are doing is this:

load data local inpath '/scratch/hive_inputs/student_input_1.txt' overwrite into table studpart4 partition(course='cse',year=2)

Which basically means "copy all the data from student_input_1.txt into the table studpart4 and set all the values of column course to 'cse' and all the values of column year to '2'". Internally, Hive will create a directory structure which will contain your partition keys. Your data will be stored in a directory like this one:

.../studpart4/course=cse/year=2/

I suspect that what you really want is Hive to detect the column values of courseand year in your .txt file and set the right values for you. In order to perform that, you have to use dynamic partitioning of your tables and follow an strategy of loading your data into an external table, and then use an INSERT OVERWRITE INTO TABLE command to store the data into your studpart4 table. The link that BigDataLearner posted in a comment describes this strategy.

I hope this helps.

Jaime Caffarel
  • 2,401
  • 4
  • 30
  • 42