0

1- I have a spark cluster on databricks community edition and I have a Kafka instance on GCP.

2- I just want to data ingestion Kafka streaming from databricks community edition and I want to analyze the data on spark.

Kafka and nifi's external ip

3- This is my connection code.

val UsYoutubeDf = 
       spark
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", "XXX.XXX.115.52:9092")
      .option("subscribe", "usyoutube")
      .load`

As is mentioned my datas arriving to the kafka. I'm entering firewall settings spark.driver.host otherwise ı cannot sending any ping to my kafka machine from databricks's cluster My temporary spark machine's ip

import org.apache.spark.sql.streaming.Trigger.ProcessingTime
 
val sortedModelCountQuery = sortedyouTubeSchemaSumDf
                          .writeStream
                          .outputMode("complete")
                          .format("console")
                          .option("truncate","false")
                          .trigger(ProcessingTime("5 seconds"))
                          .start()

After this post the datas dont coming to my spark on cluster

import org.apache.spark.sql.streaming.Trigger.ProcessingTime
sortedModelCountQuery: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@3bd8a775

It stays like this. Actually, the data is coming, but the code I wrote for analysis does not work here

Rajeev Ranjan
  • 3,588
  • 6
  • 28
  • 52
Tugrul Gokce
  • 160
  • 8
  • Ping doesn't check your Kafka listeners are correctly defined. https://www.confluent.io/blog/kafka-listeners-explained/ – OneCricketeer Mar 22 '22 at 14:30
  • @OneCricketeer so, what should ı do ? Whats your recommendation ? – Tugrul Gokce Mar 22 '22 at 14:35
  • Read the blog? Adjust your kafka server properties accordingly. It's not a Spark problem – OneCricketeer Mar 22 '22 at 14:53
  • OK, so your connection works from spark to Kafka? I can see that you mentioned that your data is received in Kafka for the connection. My understanding is that the problem you have is because you are not receiving the data after the initial connection is completed. You might want to check this documentation: https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-structured-streaming.html – Alejandro Vázquez Mar 24 '22 at 21:23

0 Answers0