How to merge map column in spark sql?

Question

I have two Map type columns in a Dataframe. Is there a way I can create a new Map column merging these two columns in spark Sql using .withColumn ?

val sampleDF = Seq(
 ("Jeff", Map("key1" -> "val1"), Map("key2" -> "val2"))
).toDF("name", "mapCol1", "mapCol2")

sampleDF.show()

+----+-----------------+-----------------+
|name|          mapCol1|          mapCol2|
+----+-----------------+-----------------+
|Jeff|Map(key1 -> val1)|Map(key2 -> val2)|
+----+-----------------+-----------------+

score 5 · Accepted Answer · answered Mar 21 '18 at 18:16

5

You can write a udf function to merge both column into one using withColumn as below

import org.apache.spark.sql.functions._
def mergeUdf = udf((map1: Map[String, String], map2: Map[String, String])=> map1 ++ map2)

sampleDF.withColumn("merged", mergeUdf(col("mapCol1"), col("mapCol2"))).show(false)

which should give you

+----+-----------------+-----------------+-------------------------------+
|name|mapCol1          |mapCol2          |merged                         |
+----+-----------------+-----------------+-------------------------------+
|Jeff|Map(key1 -> val1)|Map(key2 -> val2)|Map(key1 -> val1, key2 -> val2)|
+----+-----------------+-----------------+-------------------------------+

I hope the answer is helpful

answered Mar 21 '18 at 18:16

Ramesh Maharjan

41,071
6
69
97

Thank you !! This works but is there a way without using a udf ? – Nats Mar 21 '18 at 18:21
1

You can use array or struct inbuilt functions but I don't think result is desired by you – Ramesh Maharjan Mar 21 '18 at 18:26
@Nats: It's possible now with `map_concat` check my answer to this question. – mrsrinivas Nov 05 '20 at 06:49

mrsrinivas · Answer 2 · 2020-11-21T10:20:51.170

Use UDF only if you do not have an inbuilt function for your use case due to performance reasons.

Spark version 2.4 and above

import org.apache.spark.sql.functions.{map_concat, col}

sampleDF.withColumn("map_concat", map_concat(col("mapCol1"), col("mapCol2"))).show(false)

Outputs

+----+-----------------+-----------------+-------------------------------+
|name|mapCol1          |mapCol2          |map_concat                     |
+----+-----------------+-----------------+-------------------------------+
|Jeff|Map(key1 -> val1)|Map(key2 -> val2)|Map(key1 -> val1, key2 -> val2)|
+----+-----------------+-----------------+-------------------------------+

Spark version 2.4 below

Create a UDF as per @RameshMaharjan answer in this question, but I added with a null check to avoid NPE at runtime which will fail the job eventually if not added.

import org.apache.spark.sql.functions.{udf, col}

val map_concat = udf((map1: Map[String, String],
                      map2: Map[String, String]) =>
  if (map1 == null) {
    map2
  } else if (map2 == null) {
    map1
  } else {
    map1 ++ map2
  })

sampleDF.withColumn("map_concat", map_concat(col("mapCol1"), col("mapCol2")))
 .show(false)

score -3 · Answer 3 · answered Mar 21 '18 at 22:14

You can use struct to achieve this.

val sampleDF = Seq(
 ("Jeff", Map("key1" -> "val1"), Map("key2" -> "val2"))
).toDF("name", "mapCol1", "mapCol2")

sampleDF.show()

+----+-----------------+-----------------+
|name|          mapCol1|          mapCol2|
+----+-----------------+-----------------+
|Jeff|Map(key1 -> val1)|Map(key2 -> val2)|
+----+-----------------+-----------------+

sampleDF.withColumn("NewColumn",struct(sampleDF("mapCol1"), sampleDF("mapCol2"))).take(2)
    res17: Array[org.apache.spark.sql.Row] = Array([Jeff,Map(key1 -> val1),Map(key2 -> val2),[Map(key1 -> val1),Map(key2 -> val2)]])

+----+-----------------+-----------------+--------------------+
|name|          mapCol1|          mapCol2|           NewColumn|
+----+-----------------+-----------------+--------------------+
|Jeff|Map(key1 -> val1)|Map(key2 -> val2)|[Map(key1 -> val1...|
+----+-----------------+-----------------+--------------------+

Reference : How to merge two columns of a `Dataframe` in Spark into one 2-Tuple?

That does not merge the maps it creates a struct with 2 fields that are maps — Jelmer, Feb 05 '20 at 14:18

How to merge map column in spark sql?

3 Answers3

Spark version 2.4 and above

Spark version 2.4 below