I am getting the CSV file below (without the header) -
D,neel,32,1,pin1,state1,male
D,sani,31,2,pin1,state1,pin2,state2,female
D,raja,33,3,pin1,state1,pin2,state2,pin3,state3,male
I want to create the CSV file below using pyspark dataframe -
D,neel,32,1,pin1,state1,male
D,sani,31,2,pin1,state1,female
D,sani,31,2,pin2,state2,female
D,raja,33,3,pin1,state1,male
D,raja,33,3,pin2,state2,male
D,raja,33,3,pin3,state3,male
note: the number in 4th column in the input file determines how many pin and state columns are in the record. like
as neel has 1 in the 4th column, thus neel has 1 set of pin and state (pin1,state1)
as sani has 2 in 4th column, thus sani has 2 sets of pin and state (pin1,state1,pin2,state2
as raja has 3 in 4th column, thus raja has 3 sets of pin and state (pin1,state1,pin2,state2,pin3,state3)
I'm not able to achieve my desired output..