I am passing high number of events(more than 1000)/second from multiple sensors to a single event hub. While passing data from sensor to an event hub i don't have access to sensor id, so i can only use 1 partition as event ordering is essential. Output from event hub is stream analytics which then saves data to cosmosDB.
Event Hub(single partition) -> Stream Analytics -> CosmosDB
The issue is as the number of requests increases the latency increases as well. I was thinking of using a intermediate event hub where i could set partition key.
Event Hub(multiple partition) -> Stream Analytics -> Event Hub with Partition Key -> Stream Analytics -> CosmosDB
My concern is:
Will the event ordering be maintained in intermediate Event Hub?
Is there a performance benefit with that architecture?
And also i need to update the UI in website and mobile. Do i use cosmos DB change feed or signal R as output to stream analytics?
So i tested the system sending around 200 requests/second. I used azure function to send these requests to event hub.
Function Metrics:Request sent from azure function to event hub
Note: Event hub has 20 partitions and each event were send with a partition key.
I used another azure function to read the data off the event hub. Initially tested only by logging the data's count (without saving data to cosmosDB).
Note: I used maxBatchSize to 1 for data ordering (I am not sure if i need to do this.If i increase this batchsize will i still maintain data ordering?)
I can see that this function was able to read the data off the event hub at the same speed that it was being written.
Function Metrics: Azure function reading the data
However once i added the code to save these data to database, performance decreased significantly.
Note: CosmosDB RUs was set to 15000 RU/s
Function Metrics: Function only getting around 20req/s
I believe there is something wrong with my code. Here the function i am using
[FunctionName("ProcessStreamData")]
public static async Task Run([EventHubTrigger("eventhub-name", Connection = "EventHubsConnection")] EventData[] podStreamData, [CosmosDB(
databaseName: "dbname",
collectionName: "containername",
ConnectionStringSetting ="CosmosDBConnection")]
IAsyncCollector<SensorData> PodStreamDataOut, ILogger log)
{
var exceptions = new List<Exception>();
foreach (EventData eventData in podStreamData)
{
try
{
var messageBody = Encoding.UTF8.GetString(eventData.Body.Array, eventData.Body.Offset, eventData.Body.Count);
var allData = JsonConvert.DeserializeObject<List<SensorData>>(messageBody);
//I have data for different sensors in one eventdata so i'll need
//to loop around each of these data
//and create dynamic partitonkey ,ttl for cosmos
foreach (SensorData data in allData)
{
data.partitionKey = $"{data.mac}-{DateTime.UtcNow:yyyy-MM}";
data.ttl = 60 * 60 * 60 * 24 * 60; //60 days
data.timestamp = DateTime.UtcNow;
await PodStreamDataOut.AddAsync(data);
}
}
catch (Exception e)
{
exceptions.Add(e);
}
}