I have the following schema:
root
|-- event_params: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- key: string (nullable = true)
| | |-- value: struct (nullable = true)
| | | |-- string_value: string (nullable = true)
| | | |-- int_value: long (nullable = true)
| | | |-- float_value: double (nullable = true)
My event_params is an array of structs. Sample Data:
{
"event_params": [
{
"element": {
"value": {
"string_value": "LoginVC",
"float_value": null,
"double_value": null,
"int_value": null
},
"key": "firebase_screen_class"
}
},
{
"element": {
"value": {
"string_value": null,
"float_value": null,
"double_value": null,
"int_value": 3600000
},
"key": "engagement_time_msec"
}
},
{
"element": {
"value": {
"string_value": "app_entered_background",
"float_value": null,
"double_value": null,
"int_value": null
},
"key": "item_name"
}
}
]
}
How do I create a new column in the same row level with value from value.string_value where "key": "item_name". I do not want to filter rows since I want to repeat the process for two more keys.
So I want a new schema something like this:
root
|-- item_name_string_value: string (nullable = true)
|-- firebase_screen_class_string_value: string (nullable = true)
|-- event_params: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- key: string (nullable = true)
| | |-- value: struct (nullable = true)
| | | |-- string_value: string (nullable = true)
| | | |-- int_value: long (nullable = true)
| | | |-- float_value: double (nullable = true)
I want to achieve this using PySpark.