I was researching partitions in Hive and came upon:
http://www.brentozar.com/archive/2013/03/introduction-to-hive-partitioning/ In this link, the author says: “When inserting data into a partition, it’s necessary to include the partition columns as the last columns in the query. Thecolumn names in the source query don’t need to match the partition column names, but they really do need to be last – there’s no way to wire up Hive differently”
I have a query like:
insert overwrite table MyDestTable PARTITION (partition_date)
select
grid.partition_date,
….
I have the above query that has been running for a while without errors. As you can see, I am selecting the partition column as the very first column. Is it wrong? I have tried to corroborate the author’s statement from other sources but am not finding other documents that say the same. Does anybody here know what the right thing to do is? From my end, being a Hive newbie, I am just going by whether Hive is complaining or not (which it is not).
KS