0

I have an input stream, read from a database, with measurements from different devices. The events recorded in the database are not in chronological order but usually arrive in a 2 minute window. however some devices can send data with a timestamp several days in the past.

How can I process data for a device that is days behind the rest of the data when I've inserted CTIs in the input stream to deal with the "normal" data that's just a few minutes old at most?

Is it possible to split the input stream into 1 stream per device before I insert CTIs so the "older" stream will have CTIs that are independent of the other input streams?

Thanks in advance.

Matt
  • 274
  • 4
  • 15

2 Answers2

1

Good questions.

StreamInsight can handle late arriving events, you just need to understand that CTI events advance application time. Meaning that if a point event arrives with a start time earlier than the last CTI event, it will be dropped. You'll need to configure your advance time settings to delay the CTI event to allow for late arriving events. More on that can be found here: Advancing Application Time

As far as the best way to process data for a device that is days behind, you would probably be better off just recording the events as they come in and then re-playing the events after all the data has arrived. Depending on your needs, the start times of the events could always be the timestamp of when they were received and you could have a property on your events for the original timestamp.

You can have multiple streams that perform the same query logic at different application times, but they would have to have separate CTI events.

TXPower275
  • 511
  • 2
  • 9
  • Thanks for the reply - I'd really like to keep the processing of the majority of the data as responsive as possible so cant really wait for data a few days old to arrive before I process it but i think I'll try getting the data in two (or more) streams. One for the 'live' stream where the timestamp of the data in the db is within the window I'm processing and another to get the data with timestamps earlier than the other application time window - this older data doesn't need to be responsive so i can wait much longer. – Matt Sep 21 '13 at 11:46
  • Why are you reading events out of a database? Why not just enqueue the incoming events directly into StreamInsight first? – TXPower275 Sep 21 '13 at 15:09
  • CEP is new to us so we are taking a phased approach to its implementation and using this as a sort of test case. I'm hoping to move to direct processing but using the db as a sort of cache to provide some resiliance makes some sense too. We may also need to reprocess some historic stuff from the db if the rules change. (It's an app processing driver tachometer data for European driving directive rules). Do you think taking older data from the db as a separate stream is sensible? – Matt Sep 21 '13 at 19:27
  • The problem with going to the database first is increased latency. If you can tolerate some latency it shouldn't be an issue. Taking older data from the db as a separate stream is sensible if the requirements suggest that. Historical data can be used as reference data. – TXPower275 Sep 21 '13 at 23:50
0

I found an alternative to getting the data as a separate stream that allows me to read the data as it is saved from multiple devices and create separate streams from the main input stream by using subjects.

This works really well for my application as the sources of data can have varying timelines and each subject ends up with independent application timelines.

I'll not post the code here as the full example, including sample project, can be found on DevBikers Blog who deserves all the credit in response to my (cross posted) MSDN question.

Matt
  • 274
  • 4
  • 15