1

I want to change the Kafka topic destination to save the data depending on the value of the data in SparkStreaming. Is it possible to do so again? When I tried the following code, it only executes the first one, but does not execute the lower process.

(testdf 
.filter(f.col("value") == "A")
.selectExpr("CAST(value as STRING) as value")
.writeStream
.format("kafka")
.option("checkpointLocation", "/checkpoint_1")
.option("kafka.bootstrap.servers","~~:9092")
.option("topic", "test")
.option("startingOffsets", "latest")
.start()
      )
            
(testdf 
.filter(f.col("value") == "B")
.selectExpr("CAST(value as STRING) as value")
.writeStream
.format("kafka")
.option("checkpointLocation", "/checkpoint_2")
.option("kafka.bootstrap.servers","~~:9092")
.option("topic", "testB")
.option("startingOffsets", "latest")
.start()
      )

Data is stored in the topic name test. Can anyone think of a way to do this?

I changed the destination to save such a data frame.

|type|value|
| A  |testvalue|
| B  |testvalue|

type A to topic test. type B to topic testB.

jp_spark
  • 25
  • 3

2 Answers2

1

With the latest versions of Spark, you could just create a column topic in your dataframe which is used to direct the record into the corresponding topic.

In your case it would mean you can do something like

testdf 
  .withColumn("topic", when(f.col("value") == "A", lit("test")).otherwise(lit("testB"))
  .selectExpr("CAST(value as STRING) as value", "topic") 
  .writeStream .format("kafka") 
  .option("checkpointLocation", "/checkpoint_1") 
  .option("kafka.bootstrap.servers","~~:9092")
  .start()
Michael Heil
  • 16,250
  • 3
  • 42
  • 77
0

thx mike. I was able to achieve this by running the following code!

(
testdf 
  .withColumn("topic",f.when(f.col("testTime") == "A", f.lit("test")).otherwise(("testB")))
  .selectExpr("CAST(value as STRING) as value", "topic") 
  .writeStream
  .format("kafka") 
  .option("checkpointLocation", "/checkpoint_2") 
  .option("startingOffsets", "latest")
  .option("kafka.bootstrap.servers","9092")
  .start()
)
jp_spark
  • 25
  • 3