Inserting Partitioned Data into External Table in Hive

Question

I needed few clarification regarding inserting data into External Table.

I have created an external parquet table, which is partitioned by week pointing to a hadoop location, after this I have moved the data (a .csv file) to that location.

My doubt is since the table is partitoned by week , even if I just move the file to that directory , hive would not read and I have to use insert command ,compared to say when we have a hive table not partitioned , which will read directly from that hadoop path

OneCricketeer · Accepted Answer · 2018-02-01T14:09:27.977

1

You need to consider what data is within the CSV. For example, if you partitioned timed data by years, you wouldn't copy a CSV containing several year values into a single partition. You would need to split the Dataset.

even if I just move the file to that directory , hive would not read and I have to use insert command

Correct. Especially since it's a parquet serde trying to read a CSV.

To clarify, Hive would read the CSV if placed in a table that was stored as text.

You need a separate table where you can read text files, then insert into the other, while converting file formats

edited Feb 01 '18 at 14:09

answered Feb 01 '18 at 12:14

OneCricketeer

179,855
19
132
245

So this is not something related to partitioning, am I correct in understanding that we need to always use insert command to see data in hive after placing the file in hadoop location ? – av abhishiek Feb 01 '18 at 13:16
Partitioning is not the only consideration. File type is another, as I said – OneCricketeer Feb 01 '18 at 14:05
Can you please clarify in which cases Hive External table will directly reflect the data once I place the file in hdfs pointing to the Hive external table ? – av abhishiek Feb 01 '18 at 14:32
1

The external table immediately is queryable when data is copied to its location *in the correct format of the table* – OneCricketeer Feb 01 '18 at 14:33
Thanks cricket_007, I had one more question, suppose my data resides in a single .csv file in hdfs and has a column week , as you pointed out I cant directly create a hive partitioned table,so is there any way I can create an external partition table pointing st to above location in one step ? – av abhishiek Feb 01 '18 at 16:50
1

You need to create the partitioned table, externally or managed. Doesn't matter... You then `INSERT OVERWRITE TABLE` from the non partitioned data. The only other option I know of is to create one csv per partition (filter with some external tool), create the hdfs paths, then copy each file accordingly. Particularly because the DDL does not support `CREATE TABLE x PARTITION(y) LIKE t` https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable – OneCricketeer Feb 01 '18 at 19:05

Inserting Partitioned Data into External Table in Hive

1 Answers1