0

I have the following spark delta table structure,

+---+------------------------------------------------------+
|id |addresses                                             |
+---+------------------------------------------------------+
|1  |[{"Address":"ABC", "Street": "XXX"}, {"Address":"XYZ", "Street": "YYY"}]|
+---+------------------------------------------------------+

Here the addresses column is an array of structs.

I need to update the first Address inside array as "XXX", from the "Street" attributes value without changing the second element in the list.

So, "ABC" should be updated to "XXX" and "XYZ" should be updated to "YYY"

You can assume, I have so many attributes in the struct like street, zipcode etc so I want to leave them untouched and just update the value of Address from Street attribute.

How can I do this in Spark or Databricks or Sql?

Schema,

|-- id: string (nullable = true)
|-- addresses: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- Address: string (nullable = true)
 |    |    |    |    |-- Street: string (nullable = true)

Cheers!

mani_nz
  • 4,522
  • 3
  • 28
  • 37

1 Answers1

1

Please check below code.

scala> vdf.show(false)
+---+--------------+
|id |addresses     |
+---+--------------+
|1  |[[ABC], [XYZ]]|
+---+--------------+


scala> vdf.printSchema
root
 |-- id: integer (nullable = false)
 |-- addresses: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Address: string (nullable = true)

scala> val new_address = array(struct(lit("AAA").as("Address")))

scala> val except_first = array_except($"addresses",array($"addresses"(0)))

scala> val addresses = array_union(new_address,except_first).as("addresses")

scala> vdf.select($"id",addresses).select($"id",$"addresses",to_json($"addresses").as("json_addresses")).show(false)

+---+--------------+-------------------------------------+
|id |addresses     |json_addresses                       |
+---+--------------+-------------------------------------+
|1  |[[AAA], [XYZ]]|[{"Address":"AAA"},{"Address":"XYZ"}]|
+---+--------------+-------------------------------------+

Updated

scala> vdf.withColumn("addresses",explode($"addresses")).groupBy($"id").agg(collect_list(struct($"addresses.Street".as("Address"),$"addresses.Street")).as("addresses")).withColumn("json_data",to_json($"addresses")).show(false)
+---+------------------------+-------------------------------------------------------------------+
|id |addresses               |json_data                                                          |
+---+------------------------+-------------------------------------------------------------------+
|1  |[[XXX, XXX], [YYY, YYY]]|[{"Address":"XXX","Street":"XXX"},{"Address":"YYY","Street":"YYY"}]|
+---+------------------------+-------------------------------------------------------------------+
Srinivas
  • 8,957
  • 2
  • 12
  • 26
  • Thanks but I intended a different problem actually. My actual struct has quite a lot of attributes and I cannot set all those as each can have different values. I'll update the question as I see why that would be confusing. Apologies! – mani_nz Jun 23 '20 at 05:36
  • You no need to set columns.. only first one you need to change – Srinivas Jun 23 '20 at 05:38
  • I have updated the question properly now. The value that I want to update is not static rather need to come from another struct attribute – mani_nz Jun 23 '20 at 05:43