I need to copy data from a CSV file to a managed partitioned table in Hive.
CSV file rows are:
id,nome,cognome,ruolo
16,Mike,Maignan,Portiere
23,Fikayo,Tomori,Centrale
24,Simon,Kjaer,Centrale
19,Theo,Hernandez,Terzino
...
-------
I created a managed partitioned table on ruolo
column.
create table squadre_part
(id int, nome string, cognome string)
partitioned by (ruolo string)
row format delimited fields terminated by ','
stored as textfile
TBLPROPERTIES ("skip.header.line.count"="1") ;
-------
then I created an external table to load data from CSV file (then I will select data from external table and copy them into managed partitioned table)
create external table external_squadre
(id int, nome string, cognome string, ruolo string)
row format delimited fields terminated by ','
stored as textfile
location '/ulisse/prove/external/'
TBLPROPERTIES ("skip.header.line.count"="1") ;
-------
First of all I set these 2 properties:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
--------
when I put the CSV file in /ulisse/prove/external/ HDFS dir and execute select from external table, I see ALL THE ROWS.
-------
After the "copy" from external table to managed table:
insert into squadre_part partition (ruolo) select * from external_squadre;
I see into managed table ONLY FEW ROWS (???).
The strange thing is that in HDSF, under /user/hive/warehouse/<mydb>/<managed table>/...
I see all sub directories (and text files inside sub dir), with ALL THE ROWS of the original CSV file.
-------
The command:
msck repair table squadre_part
(after the insert into command...) didn't solve the problem.
Thank you in advance for any reply.
Moreno