3

I am writing a spark streaming job in java which takes input record from kafka. Now the record is available in JavaDstream as a custom java object. Sample record is :

TimeSeriesData: {tenant_id='581dd636b5e2ca009328b42b', asset_id='5820870be4b082f136653884', bucket='2016', parameter_id='58218d81e4b082f13665388b', timestamp=Mon Aug 22 14:50:01 IST 2016, window=null, value='11.30168'}

Now I want to aggregate this data based on min, hour, day and week of the field "timestamp".

My question is, how to aggregate JavaDstream records based on a window. A sample code will be helpful.

backtrack
  • 7,996
  • 5
  • 52
  • 99
Vikas Gite
  • 305
  • 5
  • 24
  • i would say, you have to split the timestamp into min, hour, day and week field in your timeseries data. then do the aggrgtion. – backtrack Dec 01 '16 at 05:23
  • What if I want to use window operations, like we do it for dataframes. – Vikas Gite Dec 01 '16 at 05:50
  • i thought of suggesting that but i am not sure about your usecase. Say for example if you want to have all the strams that are received last X time we can use the window option. – backtrack Dec 01 '16 at 06:07
  • I believe that windows option will work based on the datetime, I am not usre how to use timestamp field in it – backtrack Dec 01 '16 at 06:08
  • Basically, If I could use something like this http://blog.madhukaraphatak.com/introduction-to-spark-two-part-5/ But, for this I need to have all data from JavaDstream to DataFrame. Also, In my use case the requirement is : Lets say for a particular parameter_id I received 10 records in a minute with different values, then the output should contain : parameter_id, timestamp(HH:mm), value (aggregated) – Vikas Gite Dec 01 '16 at 10:05

0 Answers0