I am having a Pyspark code which applies complex transformations. In this code, we are using one particular hive external table multiple times,to be precise the subset data using partitioned column from the table multiple times..
Now if I save the this data into a managed table or databricks delta table and access this table in the code, will the performance increase?
Also as I will be accessing all the data in the new table, do I need to partition it
I have implemented the non partioned managed table and delta table and had seen 20% Increase...but not sure how the impact incase of partitioned table