StreamInsight Performance Issue

Question

I'm using StreamInsight 2.1 and running into unexpected performance problems.

I have one input adapter of Financial Data coming in with anywhere from 5,000 to 10,000 events per second. I then have a large number of queries operating against that input. Each query hooks up to the exact same passthrough query, so I have 1000 queries using the exact same input data.

To test that the system would be able to handle this, I created 1000 queries that did nothing but passthrough (from d in fullStream select d) the events to an output adapter which only Releases the event.

When I run 1,000 queries this way, the system cannot keep up with the stream. It falls farther and farther behind. If I trim it to 100 queries, the system keeps up perfectly.

Have I simply run into the performance wall with StreamInsight? Is it not able to handle the type of solution I am building? Or am I doing something stupid here.... Any help would be great, not sure what else to try to make it faster. I need it to be able to execute way more than 1000 queries and I need to run way more complicated queries than this.

Without knowing the problem you are trying to solve, I don't have all the information to provide a good solution. What version and edition of StreamInsight are you using? Why do you need 1000s of queries? — TXPower275, Oct 13 '12 at 01:18
I'm using the Evaluation Edition right now, which equates to Enterprise I believe. The newest 2.1 version. So, I'm building a solution where customers can create a query (Price>5 Vol<2k) and get back stocks that meet the results as they tick. It is likely that many of their queries will be different from each other. I can reuse the similar portions, but even with that it is likely I'll have at least 1000 queries. My guess is that this is the problem. My solution is very horizontal (lots of side-by-side) queries rather than a few queries that are LINQ statements stacked on top of each other. — timothymcgrath, Oct 13 '12 at 01:40

score 0 · Answer 1 · answered Oct 13 '12 at 06:57

I think you maybe having performance issues because of your current approach.

First off, let's cover the differences between the editions of StreamInsight. Standard edition has only 1 scheduler thread while Premium has one per core. The Evaluation edition is equivalent to Premium.

I think the way to fix this is to reduce the number of queries you have. If you are creating 1000 queries (each with their own instance of an output adapter) I can see where you are going to have issues. On a quad-core machine, you are going to have 4 scheduler threads trying to run 1000 queries.

Are your queries that are arranged "horizontally" doing the same thing? If so, see if you can consolidate them. For instance, if I needed to do a query like the "Price>5 Vol<2k" for 5 different stocks, I would write it in such a way that I can handle all 5 stocks in a standing query that sends all the results to 1 output adapter. If a client is "subscribing" to results from a query, that's something that can/should be handled by your output adapter. You could also turn results on and off for certain stocks by streaming in reference data.

Take a look at the sample below. "sourceStream" is going to be my raw stock data coming from the data source. "referenceStream" is going to be some configuration streamed in from a reference data source (i.e. SQL). The success or failure of the join will throttle the events that get passed on for further processing.

var myPrice5Vol2kSourceStream = from s in sourceStream
join r in referenceStream
on s.StockSymbol equals r.StockSymbol
select s;

I'm providing this feature to a large number of customers, allowing them to customize their queries. I planned to merge similar parts of the queries, but would easily have thousands of unique queries. I don't think StreamInsight is going to give me the performance I want for that type of situation. I have another solution that gives me much better performance for this problem (high number of unique, simple queries). I was hoping that StreamInsight could replace some of that code and just throttle data through tons of queries, but my current code seems more tuned to what we need. — timothymcgrath, Oct 29 '12 at 19:27

score 0 · Answer 2 · answered Oct 25 '12 at 17:58

Each query needs a thread to execute. You have 1000 queries. So you need how many threads? Right. Actually, StreamInsight will use the thread pool to limit the number of threads created. So ... you'll have a limited number of threads to execute your queries. You'll wind up spending more time doing context switches than actually executing your queries.

I don't understand why you even need 1000 queries. We've built apps that take 100's of sensors in from multiple sources and analyze them together ... and gotten over 100K events/sec. At the end of it, it's poor design of your app, not poor performance on StreamInsight's part, that is causing the issue.

You really need to take some time and rethink how you are going about this. No matter how you slice it, your current approach is going to cause you issues. And ... consider this ... is each input adapter creating its own thread to listen to the inbound and enqueue events? How many threads do you think that adds to the mix?

I understand what you're saying, I think what I'm trying to do isn't a perfect fit for StreamInsight. With the feature I'm trying to provide, I have potentially thousands of small, unique queries to run lots of data through. I planned to combine portions of queries that are similar, but even with that, tens of thousands of customers might add thousands of unique queries. It seems that StreamInsight would works better when building a small number of complex theories instead of a large number of simple queries. — timothymcgrath, Oct 29 '12 at 19:16

score 0 · Answer 3 · answered Apr 21 '13 at 20:37

This does sound like a scale-out problem. You have established that you can run 100 queries on your server without any problem. Then, in your comments to other answers you are talking about tens of thousands of customers adding thousands of queries. With that many customer, I suspect that you will be able to afford to add new servers to meet the demand of these masses of customers.

So increase the throughput by spreading the load, through - I don't know - some form of distributed computing perhaps?

StreamInsight Performance Issue

3 Answers3