I have Pyspark dataframe:
id | column_1 | column_2 | column_3
--------------------------------------------
1 | ["12"] | null | ["67"]
--------------------------------------------
2 | null | ["78"] | ["90"]
--------------------------------------------
3 | ["""] | ["93"] | ["56"]
--------------------------------------------
4 | ["100"] | ["78"] | ["90"]
--------------------------------------------
And I need to convert all null
values for column1 to empty array []
id | column_1 | column_2 | column_3
--------------------------------------------
1 | ["12"] | null | ["67"]
--------------------------------------------
2 | [] | ["78"] | ["90"]
--------------------------------------------
3 | ["""] | ["93"] | ["56"]
--------------------------------------------
4 | ["100"] | ["78"] | ["90"]
--------------------------------------------
Used this code, but it's not working for me.
df.withColumn("column_1", coalesce(column_1, array().cast("array<string>")))
Appreciate your help!