0

I'm stuck on an architectural question regarding the following:

Edit:

So I might be over thinking the problem, or I might rephrase the question. NServiceBus seems to be made for Messaging and Routing (of stream-like data?), whereas StreamInsight seems to be made for Event Stream Processing, Event Querying and Correlating. :).

Are there any benefits (eg. in terms of scalability, redundancy) of using Approach 1 over Approach 2?

"Approach 1"

which is a bus (e.g. NServiceBus) to get data into the database and use StreamInsight solely for querying/correlating.

"Approach 2"

which doesn't use NServiceBus but instead leverages Input/Output adapters as Pub/Sub whereas the Sub is the Output adapter which 'actively pushes the data into the Database'?

enter image description here

Original:

We are creating an application where Twitter data is streamed into our environment. This data is:

  1. Stored as raw (event) input data
  2. Parsed/filtered
  3. Queried (using StreamInsight CEP)
  4. Remaining data after previous steps is stored as complex event

For step 1 I'm not sure to what the most desired approach is:

  1. Use StreamInsight to split the datastream in two where an output adapter stores raw data in a database on one side and where another output adapter sends the data for further parsing/filtering (step 2) to another input adapter.

-or-

  1. Use a different technology (MSMQ? Azure Service Bus?) for 'routing the raw data stream to the database'

Any guidance is greatly appreciated!

Ropstah
  • 17,538
  • 24
  • 120
  • 194
  • 1
    What volume and acceleration of data are we talking about, what are the performance requirements? – EkoostikMartin Mar 12 '14 at 15:03
  • Volume is partly unknown but should be able to handle up to at least 10 messages per second for now. However this could easily be 100 per second in a year time. The specific requirement for the raw data storage has no hard performance requirements, the focus should be on the fact that -all- data is stored. Unlike steps 2 through 3, there is a (near) 'realtime' requirement there, but at that point it's pretty clear it needs to go to/through StreamInsight. – Ropstah Mar 12 '14 at 15:21

1 Answers1

1

The volume that you are talking about isn't much for StreamInsight. Not that it's a problem. Second, there's no reason to add complexity into it and you seem to be overthinking the problem. First, using StreamInsight 2.1, it's easy to create a sink that sends some data to a the database then then having additional queries that do additional analytics. This would occur in a single "Process" (not to be confused with a Windows process) and any set of queries can have different sinks for output. Make sense? If you want to see an example, you can download this demo: http://1drv.ms/1nPs2cA. Also, look at my blog at www.devbiker.net.

DevBiker
  • 451
  • 2
  • 4