I am reading a 60gb sized csv file using pyspark, doing few basic transformations and loading it into hive dynamic partition table. Hdfs block size is 128mb, so 400+ partitions are created in spark. Transformation is completing in few minutes. But while loading it's taking nearly an hour. Hive execution load is on tez. Tried to load the unpartitioned table, taking less than 4 minutes. How can i improve the performance in this scenario?
I'm using hive warehouse connector.