Currently I have a SSIS based data integration solution that extracts data from SQL/Sybase/Oracle/Flat files into staging DB and moves it to our OLTP database with some data massaging. I started researching Azure cloud based solutions to be ready when our application moves to the cloud.
The data sources will be replaced with event publishing so I envision an eventhub in the new architecture . Just to try things out, I created an evenhub and connected the output to streaming analytics job. The job is a passthru, and the output is connected to Azure SQL DB and mapped to a table with the same field names as fields specified in streaming analytics query.
I wrote a small client to publish some events and they make it to the table just fine. Now, I have few question about this approach.
Is this the most straight forward, best performing way to get events from eventhub to the database? I understand that there are 2 more choices for eventhub consumers – custom app/workers implementing either EventHubReceiver or EventProcessorHub. In both cases the consumers will read either one message or a small batch of the messages. Then custom logic will insert them into a database. This seems somewhat like a step back compared with SSIS based approach where data is operated on much larger set scale. So streaming analytics and direct connect to a DB seems like would be more performing, however that leads to my next question.
I don’t have much control when it comes to funneling data to the database from streaming analytics job. For example if my data has some key and data fields, I can’t recognize that a record with the same key exists and update the data fields. The only thing that came to mind to make it work is to put an “instead of” trigger on the table and perform update if there is a match. But using trigger in that solution also seem counterproductive with the whole streaming thing..
Am I going completely wrong about it? Is there a better way to do something like that?
Thanks!