0

I have a scenario and would like to get an expert opinion on it.

I have to load a Hive table in partitions from a relational DB via spark (python). I cannot create the hive table as I am not sure how many columns there are in the source and they might change in the future, so I have to fetch data by using; select * from tablename.

However, I am sure of the partition column and know that will not change. This column is of "date" datatype in the source db.

I am using SaveAsTable with partitionBy options and I am able to properly create folders as per the partition column. The hive table is also getting created.

The issue I am facing is that since the partition column is of "date" data type and the same is not supported in hive for partitions. Due to this I am unable to read data via hive or impala queries as it says date is not supported as partitioned column.

Please note that I cannot typecast the column at the time of issuing the select statement as I have to do a select * from tablename, and not select a,b,cast(c) as varchar from table.

RobC
  • 22,977
  • 20
  • 73
  • 80
Saim
  • 1
  • Try changing your data type for date to string in hive table schema, hopefully it should work! Refer - https://stackoverflow.com/questions/20193613/table-partitioned-by-timestamp-field , for some more details related to this – sangam.gavini Sep 30 '19 at 17:37
  • I am afraid I can't do that as I have to automate the process and can't update columns post my data load. – Saim Oct 01 '19 at 18:42

0 Answers0