2

We are using Spark structured streaming 2.4.1 version to process events from Kafka to Cassandra. The event is a nested JSON and we need to flatten the data and load in Cassandra table.

I tried to use pivot on dataframe but it is throwing below error message. Could someone please help me to resolve this issue.

Json event structure -

{
  "event_name": "some event",
  "groups": [
    {
      "data_group_name": "personname",
      "fields": [
        {
          "col_name": "firstname",
          "value": "John"
        },
        {
          "col_name": "lastname",
          "value": "williams"
        }
      ]
    },
    {
      "data_group_name": "contact",
      "fields": [
        {
          "col_name": "mobile",
          "value": "1234567890"
        },
        {
          "col_name": "home",
          "value": "0987654321"
        }
      ]
    }
  ]
}

df.pivot($"col_name").agg(first($"value"),null)

Expected Result:

----------------
event_name  firstname   lastname    mobile          home
----------------------------------------------------------------------------
some event  John            williams    1234567890  987654321

Error Message -

Queries with streaming sources must be executed with writeStream.start();; kafka at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:389)

sanjeev
  • 1,664
  • 19
  • 35

0 Answers0