Azure Stream Analytics Job degrading while pushing data to cosmos DB

Question

I Have data getting pushed from Azure IoT Hub -> Stream Analytics -> CosmosDB

I had 1 simulated device and my cosmos DB collection was of 1000 RU/s working fine . now i have made it 10 simulated devices and my Cosmos DB collection scaled to 15000 RU/s still my stream analytics getting degraded .

Is there i need to increase number of parallel connections to collection ?

can we make it more optimal As Azure pricing of Cosmos DB , Depend on throughput and RU

score 1 · Accepted Answer · edited Jul 16 '19 at 22:37

1

Can we make it more optimal as Azure pricing of Cosmos DB, depend on throughput and RUs?

I just want to share some thoughts with you about improving write performance of Cosmos db here.

1.Consistency Level

Based on the document:

Depending on what levels of read consistency your scenario needs against read and write latency, you can choose a consistency level on your database account.

You could try to set Consistency Level as Eventually. Details please refer to here.

2.Indexing:

Based on the document:

by default, Azure Cosmos DB enables synchronous indexing on each CRUD operation to your collection. This is another useful option to control the write/read performance in Azure Cosmos DB.

Please try to set index lazy. Also, remove useless index.

3.Partition:

Based on the document:

Azure Cosmos DB unlimited are the recommended approach for partitioning your data, as Azure Cosmos DB automatically scales partitions based on your workload. When writing to unlimited containers, Stream Analytics uses as many parallel writers as previous query step or input partitioning scheme.

Please partition your collection and pass the partition key in output to improve write performance.

edited Jul 16 '19 at 22:37

halfer

19,824
17
99
186

answered Oct 26 '18 at 06:10

Jay Gong

23,163
2
27
32

right now when i created COSMOS DB collection i have partitioned it by using deviceId , and for cosmos DB right now i am using default indexing . And each record contains 130 feilds . And All my queries will be depending on deviceId and time ( last hour and last day ) when i want to read the data . so partition should be by deviceId and timestamp ( converted to Day ) ? And when you that partition while writing means in sql query in stream analytics ? – Amjath Khan Oct 26 '18 at 06:32
@AmjathKhan The default index mode should be none and the consistency level default should be session.You could adjust it to see if any progress. – Jay Gong Oct 26 '18 at 06:35
@AmjathKhan I think partitioned by deviceId is fine.You could adjust other 2 factors,maybe it it optimize for performance.I just share official suggestions in the document here. – Jay Gong Oct 26 '18 at 06:40
I have removed all indexes i am using 3 Streaming job units and 5000 RU/s cosmos DB collection . Data i am sending per second is of size 38 KB/ sec ( 20 messages together size) And it has 185 fields in each message . It is Eventual consistency still it shows increase throghput for cosmos DB – Amjath Khan Oct 26 '18 at 23:25

Azure Stream Analytics Job degrading while pushing data to cosmos DB

1 Answers1