I have a scenario and would like to get an expert opinion on it.
I have to load a Hive table in partitions from a relational DB via spark (python). I cannot create the hive table as I am not sure how many columns there are in the source and they might change in the future, so I have to fetch data by using; select * from tablename
.
However, I am sure of the partition column and know that will not change. This column is of "date" datatype in the source db.
I am using SaveAsTable
with partitionBy
options and I am able to properly create folders as per the partition column. The hive table is also getting created.
The issue I am facing is that since the partition column is of "date" data type and the same is not supported in hive for partitions. Due to this I am unable to read data via hive or impala queries as it says date is not supported as partitioned column.
Please note that I cannot typecast the column at the time of issuing the select
statement as I have to do a select * from tablename
, and not select a,b,cast(c) as varchar from table
.