1

I have a nested array which looks like

a = [[1,2],[2,3]]

i have a streaming dataframe which looks like

|system    |level|

+----------+-----+

|Test1     |1    |

|Test2     |3    |

I want to include the array into third column as a nested array.

|system    |level| Data |

+----------+-----+------+

|Test1     |1    |[[1,2],[2,3]]

I tried with column and array function. But i am not sure how to use nested array.

Any help would be appreciated.

Senthil
  • 157
  • 1
  • 8

3 Answers3

1

You can add a new column, but you'll have to use a crossJoin:

a = [[1,2],[2,3]]

df.crossJoin(spark.createDataFrame([a], "array<array<bigint>>")).show()

+-------------------+----+------+----------------+
|               date|hour| value|            data|
+-------------------+----+------+----------------+
|1984-01-01 00:00:00|   1|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   2|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   3|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   4|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   5|638.55|[[1, 2], [2, 3]]|
+-------------------+----+------+----------------+
pissall
  • 7,109
  • 2
  • 25
  • 45
1

In scala API, we can use "typedLit" function to add the Array or map values in the column.

// Ref : https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$

Here is the sample code to add an Array as a column value.

import org.apache.spark.sql.functions.typedLit

val a = Seq((1,2),(2,3))
val df1 = Seq(("Test1", 1), ("Test3", 3)).toDF("a", "b")

df1.withColumn("new_col", typedLit(a)).show()

// Output

+-----+---+----------------+
|    a|  b|         new_col|
+-----+---+----------------+
|Test1|  1|[[1, 2], [2, 3]]|
|Test3|  3|[[1, 2], [2, 3]]|
+-----+---+----------------+

I hope this helps.

Neeraj Bhadani
  • 2,930
  • 16
  • 26
0

If you want to add the same array to all raws then you can use the TypedLit from the sql functions. See this answer:
https://stackoverflow.com/a/32788650/12365294

  • i did tried this. But i am unable to import "import org.apache.spark.sql.functions" in the python. i included the jar file org.apache.spark:spark-sql_2.11:2.4.4 in my execution. But still no luck. – Senthil Nov 18 '19 at 01:15
  • for pyspark you need to import "from pyspark.sql.functions import *" – Mahesh Gupta Nov 18 '19 at 05:38