0

I have a pyspark dataframe (data). I need to separate the df by multiple columns and save them as csv to particular folders . The folder names will be based on the column name after partition.

PATH = '/../' + data['Col1'] + data[Col2] + data[Col3] + '/'
data.write.partitionBy(['Col1','Col2']).csv(PATH)

I have code like this but I know it has lot of errors. First, I want to split by multiple columns, then I want the folders to be created with same name as columne names. Can anyone please tell me how to rectify the code?

Akshata
  • 111
  • 5

0 Answers0