We are using Spark structured streaming 2.4.1 version to process events from Kafka to Cassandra. The event is a nested JSON and we need to flatten the data and load in Cassandra table.
I tried to use pivot on dataframe but it is throwing below error message. Could someone please help me to resolve this issue.
Json event structure -
{
"event_name": "some event",
"groups": [
{
"data_group_name": "personname",
"fields": [
{
"col_name": "firstname",
"value": "John"
},
{
"col_name": "lastname",
"value": "williams"
}
]
},
{
"data_group_name": "contact",
"fields": [
{
"col_name": "mobile",
"value": "1234567890"
},
{
"col_name": "home",
"value": "0987654321"
}
]
}
]
}
df.pivot($"col_name").agg(first($"value"),null)
Expected Result:
----------------
event_name firstname lastname mobile home
----------------------------------------------------------------------------
some event John williams 1234567890 987654321
Error Message -
Queries with streaming sources must be executed with writeStream.start();; kafka at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:389)