I am performing transformations on a Delta table using PySpark and the DeltaTable library. The code:
from delta import *
delta_tb = DeltaTable.forPath(spark, 'path/to/table')
delta_tb.update(set = {'ma': 'round(ma, 5)'})
to round the ma
column to 5 decimal places. The update acts directly on the file at path/to/table
, which means that it is quite slow. How can I prevent the update from affecting the file and make it only affect my variable?
I know that I can achieve this through the PySpark DataFrame API, e.g.:
delta_table.toDF().withColumn('ma', expr('round(ma, 5)'))
but is there a DeltaTable-native way for me to do this?