0

I am reading data from Kafka topic and I want to pivot the data, I am using the below code in spark shell

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

val data = spark.readStream.format("kafka")
  .option("kafka.bootstrap.servers", "*******:9092")  
  .option("subscribe", "PARAMTABLE")
  .option("startingOffsets", "latest")
  .load()

val schema = new StructType().add("ENTITY_ID",StringType).add("PARAM_NAME”, StringType).add("VALUE”, StringType)
val df1 = data.selectExpr("CAST(value AS STRING)")
val dataDF = df1.select(from_json(col("value"), schema).as("data").select("data.*")
def forEachFunc(dataDF, batch_id): DataFrame = {
    dataDF.groupBy(“ENTITY_ID").pivot(“PARAM_NAME").agg(first(“VALUE")) 
    .withColumn("ProcessedTime", current_timestamp()) 
    .write.format("memory").mode(“append").save(“pivotedDataFrame.parquet”)
  }

data.writeStream.foreachBatch(forEachFunc).format("console").option("truncate",false).outputMode("append").start().awaitTermination()

But I am getting error, Someone please suggest a correct way of achieving this

Sample of my Kafka topic message is below,

{"PARAM_INSTANCE_ID":128748494,"ENTITY_ID":107437678,"PARAM_NAME":"Survey Required","VALUE":"Unchecked"}

spark-stream-error-image

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Please also post the errors you’re getting. – zmerr Aug 16 '21 at 07:41
  • Hi James, Previously I have directly used pivot in stream value.Then i got error like "Queries with streaming sources must be executed with writestream.start()" So I referred the below link - https://www.mssqltips.com/sqlservertip/6563/pivot-transformations-for-spark-streaming/ and added foreachfunc and writestream ,but I am getting syntax errors...I am not aware of how to implement the foreach function in my code – vishnupriya Aug 16 '21 at 08:45
  • Hi Vishnupriya, would you mind editing your question and adding the errors? – zmerr Aug 16 '21 at 08:48
  • James, attached the error screenshot to the question – vishnupriya Aug 16 '21 at 12:56
  • I edited your syntax a little bit on the question. you might review and take a look. – zmerr Aug 16 '21 at 13:34
  • @James I am again getting error for the edited code`:11: error: ')' expected but 'def' found. def forEachFunc(dataDF, batch_id): DataFrame = {` – vishnupriya Aug 20 '21 at 05:02
  • You need to put a `)` somewhere in the line `val dataDF = df1.select(from_json(col("value"), schema).as("data").select("data.*") ` – zmerr Aug 20 '21 at 09:23

0 Answers0