-1

I am trying to de-serialize the the spark data frame into another data frame as expected below.

Existing Dataframe Data:

enter image description here

Existing Dataframe schema:

enter image description here

Expected Dataframe:

enter image description here

Can anyone help me on this?

con
  • 5,767
  • 8
  • 33
  • 62
Pradeep Kaja
  • 115
  • 1
  • 16
  • Does this answer your question? [Explode array data into rows in spark](https://stackoverflow.com/questions/44436856/explode-array-data-into-rows-in-spark) – RudyVerboven Feb 20 '20 at 12:09

1 Answers1

0

You can use the explode function for that.

from pyspark.sql.functions import explode 
df.withColumn("ns2:fileName", explode(df.ns2:fileName))

EDIT

df.withColumn("result", explode(zip($"ns2:fileName", $"ns2:alias"))).select(
   $"result._1".alias("ns2:fileName"), $"result._2".alias("ns2:alias"))

Possible duplicate.

RudyVerboven
  • 1,204
  • 1
  • 14
  • 31
  • i tried but its throwing below error NameError: name 'explode' is not defined – Pradeep Kaja Feb 20 '20 at 10:11
  • I edited my question. But please check the duplicate question for more information. – RudyVerboven Feb 20 '20 at 10:13
  • Thanks for the suggestion but wheni try to explode only one column, im able to do and i can see the the data correctly for first column but the second column is not exploded and tried the below code ``` project_processed_df1 = project_raw_df.withColumn("ProjectId", explode(project_raw_df.ProjectId)).withColumn("ProjectDesc", explode(project_raw_df.ProjectDesc)) ``` The above code explode both the columns but it generated many to many relation and making wrong data. Any idea? – Pradeep Kaja Feb 20 '20 at 13:17
  • You can zip both array columns and explode the result of the zip. I edited my answer – RudyVerboven Feb 20 '20 at 14:18