I am now using Spark Streaming + Kafka to construct my message processing system.But I have a little technical problem , I will describe it below:
For example , I want to do a wordcount for each 10 minutes,So, in my earliest code,I set Batch Interval to 10 minutes.Code is like below:
val sparkConf = new SparkConf().setAppName(args(0)).setMaster(args(1))
val ssc = new StreamingContext(sparkConf, Minutes(10))
But I don't think it is a very good solution because 10 minutes is what a long time and large amount of data that my memory cannot sustain so much data.So , I want to reduce batch interval to 1 minutes, like:
val sparkConf = new SparkConf().setAppName(args(0)).setMaster(args(1))
val ssc = new StreamingContext(sparkConf, Minutes(1))
Then the problem comes:How can I sum up the result of 10 minutes for ten '1 minutes'? I think this word can only be done in driver instead of worker program,what can I do?
I am new learner of Spark Streaming.Any one can give me a hand?