0

I am performing transformations on a Delta table using PySpark and the DeltaTable library. The code:

from delta import *

delta_tb = DeltaTable.forPath(spark, 'path/to/table')
delta_tb.update(set = {'ma': 'round(ma, 5)'})

to round the ma column to 5 decimal places. The update acts directly on the file at path/to/table, which means that it is quite slow. How can I prevent the update from affecting the file and make it only affect my variable?

I know that I can achieve this through the PySpark DataFrame API, e.g.:

delta_table.toDF().withColumn('ma', expr('round(ma, 5)'))

but is there a DeltaTable-native way for me to do this?

razumichin
  • 84
  • 1
  • 3
  • 6
  • 1
    It depends if you want the data permanently written somewhere or not. If you only want it to affect a variable and not the file, then dataframe is fine. – Nick.Mc Jun 14 '23 at 09:58

0 Answers0