How to de-serialize the spark data frame into another data frame

Question

I am trying to de-serialize the the spark data frame into another data frame as expected below.

Existing Dataframe Data:

Existing Dataframe schema:

Expected Dataframe:

Can anyone help me on this?

Does this answer your question? [Explode array data into rows in spark](https://stackoverflow.com/questions/44436856/explode-array-data-into-rows-in-spark) — RudyVerboven, Feb 20 '20 at 12:09

RudyVerboven · Accepted Answer · 2020-02-20T14:20:26.067

0

You can use the explode function for that.

from pyspark.sql.functions import explode 
df.withColumn("ns2:fileName", explode(df.ns2:fileName))

EDIT

df.withColumn("result", explode(zip($"ns2:fileName", $"ns2:alias"))).select(
   $"result._1".alias("ns2:fileName"), $"result._2".alias("ns2:alias"))

Possible duplicate.

edited Feb 20 '20 at 14:20

answered Feb 20 '20 at 08:55

RudyVerboven

i tried but its throwing below error NameError: name 'explode' is not defined – Pradeep Kaja Feb 20 '20 at 10:11
I edited my question. But please check the duplicate question for more information. – RudyVerboven Feb 20 '20 at 10:13
Thanks for the suggestion but wheni try to explode only one column, im able to do and i can see the the data correctly for first column but the second column is not exploded and tried the below code ``` project_processed_df1 = project_raw_df.withColumn("ProjectId", explode(project_raw_df.ProjectId)).withColumn("ProjectDesc", explode(project_raw_df.ProjectDesc)) ``` The above code explode both the columns but it generated many to many relation and making wrong data. Any idea? – Pradeep Kaja Feb 20 '20 at 13:17
You can zip both array columns and explode the result of the zip. I edited my answer – RudyVerboven Feb 20 '20 at 14:18

1 Answers1