0

I Have following code :-

 val conf = new SparkConf()
  .setMaster("local[3]")
  .setAppName("KafkaReceiver")
  .set("spark.cassandra.connection.host", "192.168.0.78")
  .set("spark.cassandra.connection.keep_alive_ms", "20000")
  .set("spark.executor.memory", "2g")
  .set("spark.driver.memory", "4g")
  .set("spark.submit.deployMode", "cluster")
  .set("spark.cores.max", "10")

val sc = SparkContext.getOrCreate(conf)
val ssc = new StreamingContext(sc, Seconds(10))
val kafkaParams = Map[String, String](
  "bootstrap.servers" -> "192.168.0.1:9092",
  "group.id" -> "test-group-aditya")
val topics = Set("random")
val kafkaStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics)

In this code I'm streaming data from kafka after every 10 seconds but I'm looking for some condition where I can stream based on time or stream size in MB/byte like let say I set 5 MB so if limit reaches 5 MB I should be able to get the data rather then waiting for 10 seconds. Please suggest some solution. Thanks,

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
jAi
  • 115
  • 1
  • 14
  • what senese does it make to have a window based on size? it does not really matter does it? In a streaming application you normally work on time windows otherwise you are in a batch based environment where time does not matter – jojo_Berlin Apr 21 '18 at 10:32
  • @jojo_Berlin - Actually in my case time and size both matters like I give an example let say I'm fetching gps based vehicle data , which is slow at particular time slot but gets very high rate of data in another slot and I'm doing some processing on consumed data so if data is high I want my application to consumer earlier regardless of time frame as It might hang my processing loop so to avoid that i want to set fetch limit. – jAi Apr 21 '18 at 10:54
  • okay then you are probably looking for the backpressure feature of kafka which is described here https://stackoverflow.com/questions/39981650/limit-kafka-batches-size-when-using-spark-streaming – jojo_Berlin Apr 22 '18 at 11:19
  • Hi, I have gone through the link that you provided , but since my spark is 2.0 version, its not working in my case and then I noticed in comments , there is one more person who has mentioned that setting these , properties does not make any change when using DirectStream. So , just wondering if there is any other alternative to do so. Thanks, – jAi Apr 23 '18 at 06:38

0 Answers0