1

I needed few clarification regarding inserting data into External Table.

I have created an external parquet table, which is partitioned by week pointing to a hadoop location, after this I have moved the data (a .csv file) to that location.

My doubt is since the table is partitoned by week , even if I just move the file to that directory , hive would not read and I have to use insert command ,compared to say when we have a hive table not partitioned , which will read directly from that hadoop path

av abhishiek
  • 647
  • 2
  • 11
  • 26

1 Answers1

1

You need to consider what data is within the CSV. For example, if you partitioned timed data by years, you wouldn't copy a CSV containing several year values into a single partition. You would need to split the Dataset.

even if I just move the file to that directory , hive would not read and I have to use insert command

Correct. Especially since it's a parquet serde trying to read a CSV.

To clarify, Hive would read the CSV if placed in a table that was stored as text.

You need a separate table where you can read text files, then insert into the other, while converting file formats

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • So this is not something related to partitioning, am I correct in understanding that we need to always use insert command to see data in hive after placing the file in hadoop location ? – av abhishiek Feb 01 '18 at 13:16
  • Partitioning is not the only consideration. File type is another, as I said – OneCricketeer Feb 01 '18 at 14:05
  • Can you please clarify in which cases Hive External table will directly reflect the data once I place the file in hdfs pointing to the Hive external table ? – av abhishiek Feb 01 '18 at 14:32
  • 1
    The external table immediately is queryable when data is copied to its location *in the correct format of the table* – OneCricketeer Feb 01 '18 at 14:33
  • Thanks cricket_007, I had one more question, suppose my data resides in a single .csv file in hdfs and has a column week , as you pointed out I cant directly create a hive partitioned table,so is there any way I can create an external partition table pointing st to above location in one step ? – av abhishiek Feb 01 '18 at 16:50
  • 1
    You need to create the partitioned table, externally or managed. Doesn't matter... You then `INSERT OVERWRITE TABLE` from the non partitioned data. The only other option I know of is to create one csv per partition (filter with some external tool), create the hdfs paths, then copy each file accordingly. Particularly because the DDL does not support `CREATE TABLE x PARTITION(y) LIKE t` https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable – OneCricketeer Feb 01 '18 at 19:05