I have the same situation as stated in this question.
df = spark.createDataFrame(
[(1, "xx", [10, 20], ["a", "b"], ["p", "q"]),
(2, "yy", [30, 40], ["c", "d"], ["r", "s"]),
(3, "zz", None, ["f", "g"], ["e", "k"])],
["c1", "c2", "a1", "a2", "a3"])
df.show()
# +---+---+--------+------+------+
# | c1| c2| a1| a2| a3|
# +---+---+--------+------+------+
# | 1| xx|[10, 20]|[a, b]|[p, q]|
# | 2| yy|[30, 40]|[c, d]|[r, s]|
# | 3| zz| null|[f, g]|[e, k]|
# +---+---+--------+------+------+
I can't figure out a way to explode it correctly in PySpark. How I can achieve this result?
+---+---+----+---+---+
| c1| c2| a1| a2| a3|
+---+---+----+---+---+
| 1| xx| 10| a| p|
| 1| xx| 20| b| q|
| 2| yy| 30| c| r|
| 2| yy| 40| d| s|
| 3| zz|null| f| e|
| 3| zz|null| g| k|
+---+---+----+---+---+