2

What I am trying ot achieve is basically print "hello world" each time I receive a stream of data.

I know that on each stream I can call the function foreachRDD but that does not help me because:

  1. It might be that there is no data processed
  2. I don't want to print hello on each rdd, I want to print hello on the entire stream (whether I received data or not).

Basicaly, each time the program tries to fetch data (and it does so every 30 seconds lets say because of the spark streaming context) I would like to print hello.

Is there a way of doing this? is there like a onlisten event for spark streaming?

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
Kevin Cohen
  • 1,211
  • 2
  • 15
  • 22

1 Answers1

4

Each batch interval (in your case, 30 seconds) the DStream will contain one and only one RDD, which internally is divided by several partitions. You can check if it's not empty and only then print hello world:

// Create DStream from source
dstream.foreachRDD { rdd => if (!rdd.isEmpty) println("hello world") }
Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
  • Four basic time parameters are window, slide, batch and checkpoint interval. Batches constitutes window, steps in time by batches are sliding, and for persistance a suitable checkpoint interval must be chosen. – Vezir Jul 09 '16 at 14:59