Loading nested array into spark dataframe column

Question

I have a nested array which looks like

a = [[1,2],[2,3]]

i have a streaming dataframe which looks like

|system    |level|

+----------+-----+

|Test1     |1    |

|Test2     |3    |

I want to include the array into third column as a nested array.

|system    |level| Data |

+----------+-----+------+

|Test1     |1    |[[1,2],[2,3]]

I tried with column and array function. But i am not sure how to use nested array.

Any help would be appreciated.

score 1 · Accepted Answer · answered Nov 17 '19 at 23:14

You can add a new column, but you'll have to use a crossJoin:

a = [[1,2],[2,3]]

df.crossJoin(spark.createDataFrame([a], "array<array<bigint>>")).show()

+-------------------+----+------+----------------+
|               date|hour| value|            data|
+-------------------+----+------+----------------+
|1984-01-01 00:00:00|   1|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   2|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   3|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   4|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   5|638.55|[[1, 2], [2, 3]]|
+-------------------+----+------+----------------+

Neeraj Bhadani · Answer 2 · 2020-05-31T10:59:55.793

In scala API, we can use "typedLit" function to add the Array or map values in the column.

// Ref : https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$

Here is the sample code to add an Array as a column value.

import org.apache.spark.sql.functions.typedLit

val a = Seq((1,2),(2,3))
val df1 = Seq(("Test1", 1), ("Test3", 3)).toDF("a", "b")

df1.withColumn("new_col", typedLit(a)).show()

// Output

+-----+---+----------------+
|    a|  b|         new_col|
+-----+---+----------------+
|Test1|  1|[[1, 2], [2, 3]]|
|Test3|  3|[[1, 2], [2, 3]]|
+-----+---+----------------+

I hope this helps.

score 0 · Answer 3 · answered Nov 17 '19 at 23:30

0

If you want to add the same array to all raws then you can use the TypedLit from the sql functions. See this answer:
https://stackoverflow.com/a/32788650/12365294

answered Nov 17 '19 at 23:30

Anas Alzogbi

43
6

i did tried this. But i am unable to import "import org.apache.spark.sql.functions" in the python. i included the jar file org.apache.spark:spark-sql_2.11:2.4.4 in my execution. But still no luck. – Senthil Nov 18 '19 at 01:15
for pyspark you need to import "from pyspark.sql.functions import *" – Mahesh Gupta Nov 18 '19 at 05:38

Loading nested array into spark dataframe column

3 Answers3