How to use partition by method of pyspark to split a pyspark dataframe into different csvs based on multiple columns

Asked Dec 05 '22 at 08:41

Active Dec 06 '22 at 05:47

Viewed 97 times

I have a pyspark dataframe (data). I need to separate the df by multiple columns and save them as csv to particular folders . The folder names will be based on the column name after partition.

PATH = '/../' + data['Col1'] + data[Col2] + data[Col3] + '/'
data.write.partitionBy(['Col1','Col2']).csv(PATH)

I have code like this but I know it has lot of errors. First, I want to split by multiple columns, then I want the folders to be created with same name as columne names. Can anyone please tell me how to rectify the code?

edited Dec 06 '22 at 05:47

asked Dec 05 '22 at 08:41

Akshata

How to use partition by method of pyspark to split a pyspark dataframe into different csvs based on multiple columns

0 Answers0