I would like to rename the keys of the first level objects inside my payload.
from pyspark.sql.functions import *
ds = {'Fruits': {'apple': {'color': 'red'},'mango': {'color': 'green'}}, 'Vegetables': None}
df = spark.read.json(sc.parallelize([ds]))
df.printSchema()
"""
root
|-- Fruits: struct (nullable = true)
| |-- apple: struct (nullable = true)
| | |-- color: string (nullable = true)
| | |-- shape: string (nullable = true)
| |-- mango: struct (nullable = true)
| | |-- color: string (nullable = true)
|-- Vegetables: string (nullable = true)
"""
Desired output:
root
|-- Fruits: struct (nullable = true)
| |-- APPLE: struct (nullable = true)
| | |-- color: string (nullable = true)
| | |-- shape: string (nullable = true)
| |-- MANGO: struct (nullable = true)
| | |-- color: string (nullable = true)
|-- Vegetables: string (nullable = true)
In this case I would like to rename the keys in the first level to uppercase.
If I had a map type I could use transform keys:
df.select(transform_keys("Fruits", lambda k, _: upper(k)).alias("data_upper")).display()
Unfortunately, I have a struct type.
AnalysisException: cannot resolve 'transform_keys(Fruits, lambdafunction(upper(x_18), x_18, y_19))' due to argument data type mismatch: argument 1 requires map type, however, 'Fruits' is of structapple:struct<color:string,shape:string,mango:structcolor:string> type.;
I'm using Databricks runtime 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).